## **Amal Ahmed (Ed.)**

# **Programming Languages and Systems**

**27th European Symposium on Programming, ESOP 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018, Proceedings**

## Lecture Notes in Computer Science 10801

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

### Editorial Board

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA Gerhard Weikum, Germany

### Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, City University of Hong Kong Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

Amal Ahmed (Ed.)

# Programming Languages and Systems

27th European Symposium on Programming, ESOP 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018 Proceedings

Editor Amal Ahmed Northeastern University Boston, MA USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-89883-4 ISBN 978-3-319-89884-1 (eBook) https://doi.org/10.1007/978-3-319-89884-1

Library of Congress Control Number: 2018940640

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### ETAPS Foreword

Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am happy to announce that this is the first ETAPS with gold open access proceedings. This means that all papers are accessible by anyone for free.

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee. The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security. Organizing these conferences in a coherent, highly synchronized conference program facilitates participation in an exciting event, offering attendees the possibility to meet many researchers working in different directions in the field, and to easily attend talks of different conferences. Before and after the main conference, numerous satellite workshops take place and attract many researchers from all over the globe.

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, yielding an overall acceptance rate of 30%. I thank all the authors for their interest in ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and (ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on hardware verification. My sincere thanks to all these speakers for their inspiring and interesting talks!

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the Department of Informatics of the Aristotle University of Thessaloniki. The university was founded in 1925 and currently has around 75,000 students; it is the largest university in Greece. ETAPS 2018 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros Stratis (EasyConferences).

The overall planning for ETAPS is the main responsibility of the Steering Committee, and in particular of its Executive Board. The ETAPS Steering Committee consists of an Executive Board and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman (Twente), Panagiotis Katsaros (Thessaloniki), Ralf Küsters (Stuttgart), Ugo Dal Lago (Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), Don Sannella (Edinburgh), Andy Schürr (Darmstadt), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendees, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local organization team for all their enormous efforts that led to a fantastic ETAPS in Thessaloniki!

February 2018 Joost-Pieter Katoen

### Preface

This volume contains the papers presented at the 27th European Symposium on Programming (ESOP 2018) held April 16–19, 2018, in Thessaloniki, Greece. ESOP is one of the European Joint Conferences on Theory and Practice of Software (ETAPS). It is devoted to fundamental issues in the specification, design, analysis, and implementation of programming languages and systems.

The 36 papers in this volume were selected from 114 submissions based on originality and quality. Each submission was reviewed by three to six Program Committee (PC) members and external reviewers, with an average of 3.3 reviews per paper. Authors were given a chance to respond to these reviews during the rebuttal period from December 6 to 8, 2017. All submissions, reviews, and author responses were considered during the online discussion, which identified 74 submissions to be discussed further at the physical PC meeting held at Inria Paris, December 13–14, 2017. Each paper was assigned a guardian, who was responsible for making sure that external reviews were solicited if there was not enough non-conflicted expertise among the PC, and for presenting a summary of the reviews and author responses at the PC meeting. All non-conflicted PC members participated in the discussion of a paper's merits. PC members wrote reactions to author responses, including summaries of online discussions and discussions during the physical PC meeting, so as to help the authors understand decisions. Papers co-authored by members of the PC were held to a higher standard and discussed toward the end of the physical PC meeting. There were ten such submissions and five were accepted. Papers for which the program chair had a conflict of interest were kindly handled by Fritz Henglein.

My sincere thanks to all who contributed to the success of the conference. This includes the authors who submitted papers for consideration; the external reviewers, who provided timely expert reviews, sometimes on short notice; and the PC, who worked hard to provide extensive reviews, engaged in high-quality discussions about the submissions, and added detailed comments to help authors understand the PC discussion and decisions. I am grateful to the past ESOP PC chairs, particularly Jan Vitek and Hongseok Yang, and to the ESOP SC chairs, Giuseppe Castagna and Peter Thiemann, who helped with numerous procedural matters. I would like to thank the ETAPS SC chair, Joost-Pieter Katoen, for his amazing work and his responsiveness. HotCRP was used to handle submissions and online discussion, and helped smoothly run the physical PC meeting. Finally, I would like to thank Cătălin Hriţcu for sponsoring the physical PC meeting through ERC grant SECOMP, Mathieu Mourey and the Inria Paris staff for their help organizing the meeting, and William Bowman for assisting with the PC meeting.

February 2018 Amal Ahmed

### Organization

### Program Committee


### Additional Reviewers

Danel Ahman S. Akshay Aws Albarghouthi Jade Alglave Vincenzo Arceri Samik Basu Gavin Bierman Filippo Bonchi Thierry Coquand

Mariangiola Dezani Derek Dreyer Ronald Garcia Deepak Garg Samir Genaim Victor Gomes Peter Habermehl Matthew Hague Justin Hsu

Zhenjiang Hu Peter Jipsen Shin-ya Katsumata Andrew Kennedy Heidy Khlaaf Neelakantan Krishnaswami César Kunz Ugo Dal Lago Paul Levy Kenji Maillard Roman Manevich Paulo Mateus Antoine Miné Stefan Monnier Andrzej Murawski Anders Møller Vivek Notani

Andreas Nuyts Paulo Oliva Dominic Orchard Luca Padovani Brigitte Pientka Benjamin C. Pierce Andreas Podelski Chris Poskitt Francesco Ranzato Andrey Rybalchenko Sriram Sankaranarayanan Tetsuya Sato Sandro Stucki Zachary Tatlock Bernardo Toninho Viktor Vafeiadis

### RustBelt: Logical Foundations for the Future of Safe Systems Programming

Derek Dreyer

Max Planck Institute for Software Systems (MPI-SWS), Germany dreyer@mpi-sws.org

Abstract. Rust is a new systems programming language, developed at Mozilla, that promises to overcome the seemingly fundamental tradeoff in language design between high-level safety guarantees and low-level control over resource management. Unfortunately, none of Rust's safety claims have been formally proven, and there is good reason to question whether they actually hold. Specifically, Rust employs a strong, ownership-based type system, but then extends the expressive power of this core type system through libraries that internally use unsafe features.

In this talk, I will present RustBelt (http://plv.mpi-sws.org/rustbelt), the first formal (and machine-checked) safety proof for a language representing a realistic subset of Rust. Our proof is extensible in the sense that, for each new Rust library that uses unsafe features, we can say what verification condition it must satisfy in order for it to be deemed a safe extension to the language. We have carried out this verification for some of the most important libraries that are used throughout the Rust ecosystem.

After reviewing some essential features of the Rust language, I will describe the high-level structure of the RustBelt verification and then delve into detail about the secret weapon that makes RustBelt possible: the Iris framework for higher-order concurrent separation logic in Coq (http://iris-project.org). I will explain by example how Iris generalizes the expressive power of O'Hearn's original concurrent separation logic in ways that are essential for verifying the safety of Rust libraries. I will not assume any prior familiarity with concurrent separation logic or Rust.

This is joint work with Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and the rest of the Iris team.

### Contents

### Language Design


Ningning Xie and Bruno C. d. S. Oliveira


XIV Contents

### Program Analysis and Automated Verification


### XVI Contents

### Compiler Verification


Language Design

### **Consistent Subtyping for All**

Ningning Xie(B), Xuan Bi, and Bruno C. d. S. Oliveira

The University of Hong Kong, Pokfulam, Hong Kong {nnxie,xbi,bruno}@cs.hku.hk

**Abstract.** Consistent subtyping is employed in some gradual type systems to validate type conversions. The original definition by Siek and Taha serves as a guideline for designing gradual type systems with subtyping. Polymorphic types `a la System F also induce a subtyping relation that relates polymorphic types to their instantiations. However Siek and Taha's definition is not adequate for polymorphic subtyping. The first goal of this paper is to propose a generalization of consistent subtyping that is adequate for polymorphic subtyping, and subsumes the original definition by Siek and Taha. The new definition of consistent subtyping provides novel insights with respect to previous polymorphic gradual type systems, which did not employ consistent subtyping. The second goal of this paper is to present a gradually typed calculus for implicit (higher-rank) polymorphism that uses our new notion of consistent subtyping. We develop both declarative and (bidirectional) algorithmic versions for the type system. We prove that the new calculus satisfies all static aspects of the refined criteria for gradual typing, which are mechanically formalized using the Coq proof assistant.

### **1 Introduction**

Gradual typing [21] is an increasingly popular topic in both programming language practice and theory. On the practical side there is a growing number of programming languages adopting gradual typing. Those languages include Clojure [6], Python [27], TypeScript [5], Hack [26], and the addition of Dynamic to C# [4], to cite a few. On the theoretical side, recent years have seen a large body of research that defines the foundations of gradual typing [8,9,13], explores their use for both functional and object-oriented programming [21,22], as well as its applications to many other areas [3,24].

A key concept in gradual type systems is *consistency* [21]. Consistency weakens type equality to allow for the presence of *unknown* types. In some gradual type systems with subtyping, consistency is combined with subtyping to give rise to the notion of *consistent subtyping* [22]. Consistent subtyping is employed by gradual type systems to validate type conversions arising from conventional subtyping. One nice feature of consistent subtyping is that it is derivable from the more primitive notions of *consistency* and *subtyping*. As Siek and Taha [22] put it this shows that "*gradual typing and subtyping are orthogonal and can be combined in a principled fashion*". Thus consistent subtyping is often used as a guideline for designing gradual type systems with subtyping.

Unfortunately, as noted by Garcia et al. [13], notions of consistency and/or consistent subtyping "*become more difficult to adapt as type systems get more complex*". In particular, for the case of type systems with subtyping, certain kinds of subtyping do not fit well with the original definition of consistent subtyping by Siek and Taha [22]. One important case where such mismatch happens is in type systems supporting implicit (higher-rank) polymorphism [11,18]. It is well-known that polymorphic types `a la System F induce a subtyping relation that relates polymorphic types to their instantiations [16,17]. However Siek and Taha's [22] definition is not adequate for this kind of subtyping. Moreover the current framework for *Abstracting Gradual Typing* (AGT) [13] also does not account for polymorphism, with the authors acknowledging that this is one of the interesting avenues for future work.

Existing work on gradual type systems with polymorphism does not use consistent subtyping. The Polymorphic Blame Calculus (λB) [1] is an *explicitly* polymorphic calculus with explicit casts, which is often used as a target language for gradual type systems with polymorphism. In λB a notion of *compatibility* is employed to validate conversions allowed by casts. Interestingly λB *allows conversions from polymorphic types to their instantiations*. For example, it is possible to cast a value with type ∀a.a → a into Int → Int. Thus an important remark here is that while λB is explicitly polymorphic, casting and conversions are closer to *implicit* polymorphism. That is, in a conventional explicitly polymorphic calculus (such as System F), the primary notion is type equality, where instantiation is not taken into account. Thus the types ∀a.a → a and Int → Int are deemed *incompatible*. However in *implicitly* polymorphic calculi [11,18] ∀a.a → a and Int → Int are deemed *compatible*, since the latter type is an instantiation of the former. Therefore λB is in a sense a hybrid between implicit and explicit polymorphism, utilizing type equality (`a la System F) for validating applications, and *compatibility* for validating casts.

An alternative approach to polymorphism has recently been proposed by Igarashi et al. [14]. Like λB their calculus is explicitly polymorphic. However, in that work they employ type consistency to validate cast conversions, and forbid conversions from ∀a.a → a to Int → Int. This makes their casts closer to explicit polymorphism, in contrast to λB. Nonetheless, there is still same flavour of implicit polymorphism in their calculus when it comes to interactions between dynamically typed and polymorphically typed code. For example, in their calculus type consistency allows types such as ∀a.a → Int to be related to → Int, where some sort of (implicit) polymorphic subtyping is involved.

The first goal of this paper is to study the gradually typed subtyping and consistent subtyping relations for *predicative implicit polymorphism*. To accomplish this, we first show how to reconcile consistent subtyping with polymorphism by generalizing the original consistent subtyping definition by Siek and Taha [22]. The new definition of consistent subtyping can deal with polymorphism, and preserves the orthogonality between consistency and subtyping. To slightly rephrase Siek and Taha [22], the motto of our paper is that:

### *Gradual typing and* **polymorphism** *are orthogonal and can be combined in a principled fashion.*<sup>1</sup>

With the insights gained from our work, we argue that, for implicit polymorphism, Ahmed et al.'s [1] notion of compatibility is too permissive (i.e. too many programs are allowed to type-check), and that Igarashi et al.'s [14] notion of type consistency is too conservative. As a step towards an algorithmic version of consistent subtyping, we present a syntax-directed version of consistent subtyping that is sound and complete with respect to our formal definition of consistent subtyping. The syntax-directed version of consistent subtyping is remarkably simple and well-behaved, without the ad-hoc *restriction* operator [22]. Moreover, to further illustrate the generality of our consistent subtyping definition, we show that it can also account for *top types*, which cannot be dealt with by Siek and Taha's [22] definition either.

The second goal of this paper is to present a (source-level) gradually typed calculus for (predicative) implicit higher-rank polymorphism that uses our new notion of consistent subtyping. As far as we are aware, there is no work on bridging the gap between implicit higher-rank polymorphism and gradual typing, which is interesting for two reasons. On one hand, modern functional languages (such as Haskell) employ sophisticated type-inference algorithms that, aided by type annotations, can deal with implicit higher-rank polymorphism. So a natural question is how gradual typing can be integrated in such languages. On the other hand, there is several existing work on integrating *explicit* polymorphism into gradual typing [1,14]. Yet no work investigates how to move such expressive power into a source language with implicit polymorphism. Therefore as a step towards gradualizing such type systems, this paper develops both declarative and algorithmic versions for a gradual type system with implicit higher-rank polymorphism. The new calculus brings the expressive power of full implicit higher-rank polymorphic into a gradually typed source language. We prove that our calculus satisfies all of the *static* aspects of the refined criteria for gradual typing [25], while discussing some issues related with the *dynamic guarantee*.

In summary, the contributions of this paper are:

	- a new definition of consistent subtyping that subsumes and generalizes that of Siek and Taha [22], and can deal with polymorphism and top types.
	- a syntax-directed version of consistent subtyping that is sound and complete with respect to our definition of consistent subtyping, but still guesses polymorphic instantiations.

<sup>1</sup> Note here that we borrow Siek and Taha's [22] motto mostly to talk about the static semantics. As Ahmed et al. [1] show there are several non-trivial interactions between polymorphism and casts at the level of the dynamic semantics.

$$
\boxed{A <: B}
$$

**Fig. 1.** Subtyping and type consistency in **FOb**? <:


### **2 Background and Motivation**

In this section we review a simple gradually typed language with objects [22], to introduce the concept of consistency subtyping. We also briefly talk about the Odersky-L¨aufer type system for higher-rank types [17], which serves as the original language on which our gradually typed calculus with implicit higherrank polymorphism is based.

### **2.1 Gradual Subtyping**

Siek and Taha [22] developed a gradual typed system for object-oriented languages that they call **FOb**? *<sup>&</sup>lt;*:. Central to gradual typing is the concept of *consistency* (written ∼) between gradual types, which are types that may involve the unknown type . The intuition is that consistency relaxes the structure of a type system to tolerate unknown positions in a gradual type. They also defined the subtyping relation in a way that static type safety is preserved. Their key

<sup>2</sup> All supplementary materials are available at https://bitbucket.org/xieningning/ consistent-subtyping.

insight is that the unknown type is neutral to subtyping, with only <: . Both relations are found in Fig. 1.

A primary contribution of their work is to show that consistency and subtyping are orthogonal. To compose subtyping and consistency, Siek and Taha [22] defined *consistent subtyping* (written -) in two equivalent ways:

#### **Definition 1 (Consistent Subtyping `a la Siek and Taha** [22]**)**

*–* A -B *if and only if* A ∼ C *and* C <: B *for some* C*.*

*–* A -B *if and only if* A <: C *and* C ∼ B *for some* C*.*

Both definitions are non-deterministic because of the intermediate type C. To remove non-determinism, they proposed a so-called *restriction operator*, written A|*<sup>B</sup>* that masks off the parts of a type A that are unknown in a type B.

$$\begin{aligned} |A|\_B &= \text{case} \mid A, B \text{ of } | ( -, \star ) \Rightarrow \star \\ &\quad | \, A\_1 \to A\_2, B\_1 \to B\_2 = A\_1 |\_{B\_1} \to A\_2 |\_{B\_2} \\ &\quad | \, [l\_1 : A\_1, \ldots, l\_n : A\_n], [l\_1 : B\_1, \ldots, l\_m : B\_m] \text{ if } \; n \le m \Rightarrow [l\_1 : A\_1 |\_{B\_1}, \ldots, l\_n : A\_n |\_{B\_n}] \\ &\quad | \, [l\_1 : A\_1, \ldots, l\_n : A\_n], [l\_1 : B\_1, \ldots, l\_m : B\_m] \text{ if } \; n > m \Rightarrow \\ &\quad [l\_1 : A\_1 |\_{B\_1}, \ldots, l\_m : A\_m |\_{B\_m}, \ldots, l\_n : A\_n] \\ &\quad | \, \text{otherwise} \Rightarrow A \end{aligned}$$

With the restriction operator, consistent subtyping is simply defined as A - B ≡ A|*<sup>B</sup>* <: B|*A*. Then they proved that this definition is equivalent to Definition 1.

#### **2.2 The Odersky-L¨aufer Type System**

The calculus we are combining gradual typing with is the well-established predicative type system for higher-rank types proposed by Odersky and L¨aufer [17]. One difference is that, for simplicity, we do not account for a let expression, as there is already existing work about gradual type systems with let expressions and let generalization (for example, see Garcia and Cimini [12]). Similar techniques can be applied to our calculus to enable let generalization.

The syntax of the type system, along with the typing and subtyping judgments is given in Fig. 2. An implicit assumption throughout the paper is that variables in contexts are distinct. We save the explanations for the static semantics to Sect. 4, where we present our gradually typed version of the calculus.

#### **2.3 Motivation: Gradually Typed Higher-Rank Polymorphism**

Our work combines implicit (higher-rank) polymorphism with gradual typing. As is well known, a gradually typed language supports both fully static and fully dynamic checking of program properties, as well as the continuum between these two extremes. It also offers programmers fine-grained control over the static-todynamic spectrum, i.e., a program can be evolved by introducing more or less precise types as needed [13].

**Fig. 2.** Syntax and static semantics of the Odersky-L¨aufer type system.

Haskell is a language that supports implicit higher-rank polymorphism, but no gradual typing. Therefore some programs that are safe at run-time may be rejected due to the conservativity of the type system. For example, consider the following Haskell program adapted from Jones et al. [18]:

$$foo :: ( [\mathbf{Int}], [\mathbf{Char}] )\\ foo = \mathbf{let} \; f \; x = (x \; [1, 2], \; x \; ['a', 'b']) \; \mathbf{in} \; f \; \mathbf{reverse}$$

This program is rejected by Haskell's type checker because Haskell implements the Damas-Milner rule that a lambda-bound argument (such as *x*) can only have a monotype, i.e., the type checker can only assign *x* the type [**Int**] → [**Int**], or [**Char**] <sup>→</sup> [**Char**], but not ∀a.[a] → [a]. Finding such manual polymorphic annotations can be non-trivial. Instead of rejecting the program outright, due to missing type annotations, gradual typing provides a simple alternative by giving x the unknown type (denoted ). With such typing the same program type-checks and produces ([2, 1], [ b , a ]). By running the program, programmers can gain some additional insight about the run-time behaviour. Then, with such insight, they can also give x a more precise type (∀a.[a] → [a]) a posteriori so that the program continues to type-check via implicit polymorphism and also grants

**Fig. 3.** Syntax of types, consistency, and subtyping in the declarative system.

more static safety. In this paper, we envision such a language that combines the benefits of both implicit higher-rank polymorphism and gradual typing.

### **3 Revisiting Consistent Subtyping**

In this section we explore the design space of consistent subtyping. We start with the definitions of consistency and subtyping for polymorphic types, and compare with some relevant work. We then discuss the design decisions involved towards our new definition of consistent subtyping, and justify the new definition by demonstrating its equivalence with that of Siek and Taha [22] and the AGT approach [13] on simple types.

The syntax of types is given at the top of Fig. 3. We write A, B for types. Types are either the integer type Int, type variables a, functions types A → B, universal quantification ∀a.A, or the unknown type . Though we only have one base type Int, we also use Bool for the purpose of illustration. Note that monotypes τ contain all types other than the universal quantifier and the unknown type . We will discuss this restriction when we present the subtyping rules. Contexts Ψ are *ordered* lists of type variable declarations and term variables.

#### **3.1 Consistency and Subtyping**

We start by giving the definitions of consistency and subtyping for polymorphic types, and comparing our definitions with the compatibility relation by Ahmed et al. [1] and type consistency by Igarashi et al. [14].

*Consistency.* The key observation here is that consistency is mostly a structural relation, except that the unknown type can be regarded as any type. Following this observation, we naturally extend the definition from Fig. 1 with polymorphic types, as shown at the middle of Fig. 3. In particular a polymorphic type ∀a.A is consistent with another polymorphic type ∀a.B if A is consistent with B.

*Subtyping.* We express the fact that one type is a polymorphic generalization of another by means of the subtyping judgment Ψ A <: B. Compared with the subtyping rules of Odersky and L¨aufer [17] in Fig. 2, the only addition is the neutral subtyping of . Notice that, in the rule S-ForallL, the universal quantifier is only allowed to be instantiated with a *monotype*. The judgment Ψ τ checks all the type variables in τ are bound in the context Ψ. For space reasons, we omit the definition. According to the syntax in Fig. 3, monotypes do not include the unknown type . This is because if we were to allow the unknown type to be used for instantiation, we could have ∀a.a → a <: → by instantiating a with . Since → is consistent with any functions A → B, for instance, Int → Bool, this means that we could provide an expression of type ∀a.a → a to a function where the input type is supposed to be Int → Bool. However, as we might expect, ∀a.a → a is definitely not compatible with Int → Bool. This does not hold in any polymorphic type systems without gradual typing. So the gradual type system should not accept it either. (This is the socalled *conservative extension* property that will be made precise in Sect. 4.3.)

Importantly there is a subtle but crucial distinction between a type variable and the unknown type, although they all represent a kind of "arbitrary" type. The unknown type stands for the absence of type information: it could be *any type* at *any instance*. Therefore, the unknown type is consistent with any type, and additional type-checks have to be performed at runtime. On the other hand, a type variable indicates *parametricity*. In other words, a type variable can only be instantiated to a single type. For example, in the type ∀a.a → a, the two occurrences of a represent an arbitrary but single type (e.g., Int → Int, Bool → Bool), while → could be an arbitrary function (e.g., Int → Bool) at runtime.

*Comparison with Other Relations.* In other polymorphic gradual calculi, consistency and subtyping are often mixed up to some extent. In λB [1], the compatibility relation for polymorphic types is defined as follows:

$$\frac{A \prec B}{A \prec \forall X.B}\_{\text{COMP-ALLR}} \quad \quad \quad \quad \frac{A[X \leftrightarrow \star] \prec B}{\forall X.A \prec B}\_{\text{COMP-ALLR}}$$

Notice that, in rule Comp-AllL, the universal quantifier is *always* instantiated to . However, this way, λB allows ∀a.a → a ≺ Int → Bool, which as we discussed before might not be what we expect. Indeed λB relies on sophisticated runtime checks to rule out such instances of the compatibility relation a posteriori.

**Fig. 4.** Examples that break the original definition of consistent subtyping.

Igarashi et al. [14] introduced the so-called *quasi-polymorphic* types for types that may be used where a ∀-type is expected, which is important for their purpose of conservativity over System F. Their type consistency relation, involving polymorphism, is defined as follows<sup>3</sup>:

$$\begin{array}{ccc} A \sim B\\ \hline \forall a. A \sim \forall a. B \end{array} \qquad \begin{array}{c} A \sim B \qquad B \neq \forall a. B' \\ \hline \forall a. A \sim B \end{array} \begin{array}{c} \star \in \mathsf{Types}(B) \end{array}$$

Compared with our consistency definition in Fig. 3, their first rule is the same as ours. The second rule says that a non ∀-type can be consistent with a ∀-type only if it contains . In this way, their type system is able to reject ∀a.a → a ∼ Int → Bool. However, in order to keep conservativity, they also reject ∀a.a → a ∼ Int → Int, which is perfectly sensible in their setting (i.e., explicit polymorphism). However with implicit polymorphism, we would expect ∀a.a → a to be related with Int → Int, since a can be instantiated to Int.

Nonetheless, when it comes to interactions between dynamically typed and polymorphically typed terms, both relations allow ∀a.a → Int to be related with → Int for example, which in our view, is some sort of (implicit) polymorphic subtyping combined with type consistency, and that should be derivable by the more primitive notions in the type system (instead of inventing new relations). One of our design principles is that subtyping and consistency is *orthogonal*, and can be naturally superimposed, echoing the same opinion of Siek and Taha [22].

#### **3.2 Towards Consistent Subtyping**

With the definitions of consistency and subtyping, the question now is how to compose these two relations so that two types can be compared in a way that takes these two relations into account.

<sup>3</sup> This is a simplified version.

Unfortunately, the original definition of Siek and Taha [22] (Definition 1) does not work well with our definitions of consistency and subtyping for polymorphic types. Consider two types: (∀a.a → Int) → Int, and ( → Int) → Int. The first type can only reach the second type in one way (first by applying consistency, then subtyping), but not the other way, as shown in Fig. 4a. We use ⊥ to mean that we cannot find such a type. Similarly, there are situations where the first type can only reach the second type by the other way (first applying subtyping, and then consistency), as shown in Fig. 4b.

What is worse, if those two examples are composed in a way that those types all appear co-variantly, then the resulting types cannot reach each other in either way. For example, Fig. 4c shows such two types by putting a Bool type in the middle, and neither definition of consistent subtyping works.

*Observations on Consistent Subtyping Based on Information Propagation.* In order to develop the correct definition of consistent subtyping for polymorphic types, we need to understand how consistent subtyping works. We first review two important properties of subtyping: (1) subtyping induces the subsumption rule: if A <: B, then an expression of type A can be used where B is expected; (2) subtyping is transitive: if A <: B, and B <: C, then A <: C. Though consistent subtyping takes the unknown type into consideration, the subsumption rule should also apply: if A - B, then an expression of type A can also be used where B is expected, given that there might be some information lost by consistency. A crucial difference from subtyping is that consistent subtyping is *not* transitive because information can only be lost once (otherwise, any two types are a consistent subtype of each other). Now consider a situation where we have both A <: B, and B - C, this means that A can be used where B is expected, and B can be used where C is expected, with possibly some loss of information. In other words, we should expect that A can be used where C is expected, since there is at most one-time loss of information.

#### **Observation 1.** *If* A <: B*, and* B - C*, then* A -C*.*

This is reflected in Fig. 5a. A symmetrical observation is given in Fig. 5b:

#### **Observation 2.** *If* C - B*, and* B <: A*, then* C -A*.*

From the above observations, we see what the problem is with the original definition. In Fig. 5a, if B can reach C by T1, then by subtyping transitivity, A can reach C by T1. However, if B can only reach C by T2, then A cannot reach C through the original definition. A similar problem is shown in Fig. 5b.

However, it turns out that those two problems can be fixed using the same strategy: instead of taking one-step subtyping and one-step consistency, our definition of consistent subtyping allows types to take *one-step subtyping, one-step consistency, and one more step subtyping*. Specifically, A <: B ∼ T<sup>2</sup> <: C (in Fig. 5a) and C <: T<sup>1</sup> ∼ B <: A (in Fig. 5b) have the same relation chain: subtyping, consistency, and subtyping.

**Fig. 5.** Observations of consistent subtyping

**Fig. 6.** Example that is fixed by the new definition of consistent subtyping.

*Definition of Consistent Subtyping.* From the above discussion, we are ready to modify Definition 1, and adapt it to our notation:

#### **Definition 2 (Consistent Subtyping)**

$$
\frac{\Psi \vdash A <: C \qquad C \sim D \qquad \Psi \vdash D <: B}{\Psi \vdash A \lessapprox B}
$$

With Definition 2, Fig. 6 illustrates the correct relation chain for the broken example shown in Fig. 4c. At first sight, Definition 2 seems worse than the original: we need to guess *two* types! It turns out that Definition 2 is a generalization of Definition 1, and they are equivalent in the system of Siek and Taha [22]. However, more generally, Definition 2 is compatible with polymorphic types.

#### **Proposition 1 (Generalization of Consistent Subtyping)**


#### **3.3 Abstracting Gradual Typing**

Garcia et al. [13] presented a new foundation for gradual typing that they call the *Abstracting Gradual Typing* (AGT) approach. In the AGT approach, gradual types are interpreted as sets of static types, where static types refer to types containing no unknown types. In this interpretation, predicates and functions on static types can then be lifted to apply to gradual types. Central to their approach is the so-called *concretization* function. For simple types, a concretization γ from gradual types to a set of static types<sup>4</sup> is defined as follows:

### **Definition 3 (Concretization)**

$$\gamma(\mathsf{lnt}) = \{\mathsf{lnt}\} \qquad \gamma(A \to B) = \gamma(A) \to \gamma(B) \qquad \gamma(\star) = \{All\ static\ types\}$$

Based on the concretization function, subtyping between static types can be lifted to gradual types, resulting in the consistent subtyping relation:

**Definition 4 (Consistent Subtyping in AGT).** A < : B *if and only if* A<sup>1</sup> <: B<sup>1</sup> *for some* A<sup>1</sup> ∈ γ(A)*,* B<sup>1</sup> ∈ γ(B)*.*

Later they proved that this definition of consistent subtyping coincides with that of Siek and Taha [22] (Definition 1). By Proposition 1, we can directly conclude that our definition coincides with AGT: -

#### **Proposition 2 (Equivalence to AGT on Simple Types).** A - B *iff* A < : B*.*

However, AGT does not show how to deal with polymorphism (e.g. the interpretation of type variables) yet. Still, as noted by Garcia et al. [13], it is a promising line of future work for AGT, and the question remains whether our definition would coincide with it.

Another note related to AGT is that the definition is later adopted by Castagna and Lanvin [7], where the static types A1, B<sup>1</sup> in Definition 4 can be algorithmically computed by also accounting for top and bottom types.

#### **3.4 Directed Consistency**

*Directed consistency* [15] is defined in terms of precision and static subtyping:

$$\frac{A' \sqsubseteq A \quad \quad A < \colon B \qquad B' \sqsubseteq B}{A' \underset{\sim}{\leq} B'}$$

The judgment A  B is read "A is less precise than B". In their setting, precision is defined for type constructors and subtyping for static types. If we interpret this definition from AGT's point of view, finding a more precise static type<sup>5</sup> has the same effect as concretization. Namely, A  A implies A ∈ γ(A ) and B  B implies B ∈ γ(B ). Therefore we consider this definition as AGT-style. From this perspective, this definition naturally coincides with Definition 2.

The value of their definition is that consistent subtyping is derived compositionally from *static subtyping* and *precision*. These are two more atomic relations. At first sight, their definition looks very similar to Definition 2 (replacing  by <: and <: by ∼). Then a question arises as to *which one is more fundamental*. To answer this, we need to discuss the relation between consistency and precision.

<sup>4</sup> For simplification, we directly regard type constructor <sup>→</sup> as a set-level operator. <sup>5</sup> The definition of precision of types is given in appendix.

*Relating Consistency and Precision.* Precision is a partial order (anti-symmetric and transitive), while consistency is symmetric but not transitive. Nonetheless, precision and consistency are related by the following proposition:

### **Proposition 3 (Consistency and Precision)**


It may seem that precision is a more atomic relation, since consistency can be derived from precision. However, recall that consistency is in fact an equivalence relation lifted from static types to gradual types. Therefore defining consistency independently is straightforward, and it is theoretically viable to validate the definition of consistency directly. On the other hand, precision is usually connected with the gradual criteria [25], and finding a correct partial order that adheres to the criteria is not always an easy task. For example, Igarashi et al. [14] argued that term precision for System F*<sup>G</sup>* is actually nontrivial, leaving the gradual guarantee of the semantics as a conjecture. Thus precision can be difficult to extend to more sophisticated type systems, e.g. dependent types.

Still, it is interesting that those two definitions illustrate the correspondence of different foundations (on simple types): one is defined directly on gradual types, and the other stems from AGT, which is based on static subtyping.

### **3.5 Consistent Subtyping Without Existentials**

Definition 2 serves as a fine specification of how consistent subtyping should behave in general. But it is inherently non-deterministic because of the two intermediate types C and D. As with Definition 1, we need a combined relation to directly compare two types. A natural attempt is to try to extend the restriction operator for polymorphic types. Unfortunately, as we show below, this does not work. However it is possible to devise an equivalent inductive definition instead.

*Attempt to Extend the Restriction Operator.* Suppose that we try to extend the restriction operator to account for polymorphic types. The original restriction operator is structural, meaning that it works for types of similar structures. But for polymorphic types, two input types could have different structures due to universal quantifiers, e.g., ∀a.a → Int and (Int → ) → Int. If we try to mask the first type using the second, it seems hard to maintain the information that a should be instantiated to a function while ensuring that the return type is masked. There seems to be no satisfactory way to extend the restriction operator in order to support this kind of non-structural masking.

*Interpretation of the Restriction Operator and Consistent Subtyping.* If the restriction operator cannot be extended naturally, it is useful to take a step back and revisit what the restriction operator actually does. For consistent subtyping, two input types could have unknown types in different positions, but we only care about the known parts. What the restriction operator does is (1) erase

Ψ A - B Ψ,a A - B Ψ A - ∀a.B CS-ForallR Ψ τ Ψ A[a → τ ] - B Ψ ∀a.A - B CS-ForallL Ψ B<sup>1</sup> - A<sup>1</sup> Ψ A<sup>2</sup> - B<sup>2</sup> Ψ A<sup>1</sup> → A<sup>2</sup> - B<sup>1</sup> → B<sup>2</sup> CS-Fun <sup>a</sup> <sup>∈</sup> <sup>Ψ</sup> Ψ a <sup>a</sup> CS-TVar Ψ Int - Int CS-Int Ψ - - A CS-UnknownL Ψ A - -CS-UnknownR

**Fig. 7.** Consistent Subtyping for implicit polymorphism.

the type information in one type if the corresponding position in the other type is the unknown type; and (2) compare the resulting types using the normal subtyping relation. The example below shows the masking-off procedure for the types Int → → Bool and Int → Int → . Since the known parts have the relation that Int → → <: Int → → , we conclude that Int → → Bool -Int → Int → .

$$\begin{aligned} \mathsf{Int} \to \overline{\begin{bmatrix} \star \\ \mathsf{Int} \end{bmatrix}} \to \overline{\begin{bmatrix} \mathsf{Bool} \\ \mathsf{Int} \end{bmatrix}} \quad \mathsf{Int} \to \mathsf{Int} \to \mathsf{s} \\\quad \mathsf{Int} \to \mathsf{st} \end{bmatrix} \quad \mathsf{Int} \to \mathsf{st} \to \mathsf{\star} \\\quad \mathsf{Int} \to \mathsf{\star} \to \mathsf{\star} \end{aligned} \quad \mathsf{\star} \mathbf{\mathsf{Int}} \to \mathsf{\star} \to \mathsf{\star} \begin{cases} \mathsf{Int} \\ \mathsf{Int} \to \mathsf{s} \end{cases}$$

Here differences of the types in boxes are erased because of the restriction operator. Now if we compare the types in boxes directly instead of through the lens of the restriction operator, we can observe that the *consistent subtyping relation always holds between the unknown type and an arbitrary type.* We can interpret this observation directly from Definition 2: the unknown type is neutral to subtyping ( <: ), the unknown type is consistent with any type ( ∼ A), and subtyping is reflexive (A <: A). Therefore, *the unknown type is a consistent subtype of any type (* - A*), and vice versa (*A - *).* Note that this interpretation provides a general recipe on how to lift a (static) subtyping relation to a (gradual) consistent subtyping relation, as discussed below.

*Defining Consistent Subtyping Directly.* From the above discussion, we can define the consistent subtyping relation directly, *without* resorting to subtyping or consistency at all. The key idea is that we replace <: with in Fig. 3, get rid of rule S-Unknown and add two extra rules concerning , resulting in the rules of consistent subtyping in Fig. 7. Of particular interest are the rules CS-UnknownL and CS-UnknownR, both of which correspond to what we just said: the unknown type is a consistent subtype of any type, and vice versa. From now on, we use the symbol to refer to the consistent subtyping relation in Fig. 7. What is more, we can prove that those two are equivalent<sup>6</sup>:

*T* **heorem 1**. Ψ A -B ⇔ Ψ A <: C, C ∼ D, Ψ D <: B *for some* C, D.

<sup>6</sup> Theorems with <sup>T</sup> are those proved in Coq. The same applies to <sup>L</sup>emmas.

Ψ e : A s x : A ∈ Ψ <sup>Ψ</sup> <sup>x</sup> : <sup>A</sup> <sup>x</sup> Var <sup>Ψ</sup> <sup>n</sup> : Int <sup>n</sup> Nat Ψ, a e : A s Ψ e : ∀a.A Λa.s Gen Ψ, x : A e : B s Ψ λx : A. e : A → B λx : A. s LamAnn Ψ, x : τ e : B s <sup>Ψ</sup> λx. e : <sup>τ</sup> <sup>→</sup> <sup>B</sup> λx : τ. s Lam Ψ e<sup>1</sup> : A s<sup>1</sup> Ψ AA<sup>1</sup> → A<sup>2</sup> Ψ e<sup>2</sup> : A<sup>3</sup> s<sup>2</sup> Ψ A<sup>3</sup> - A<sup>1</sup> Ψ e<sup>1</sup> e<sup>2</sup> : A<sup>2</sup> ( A → A<sup>1</sup> → A<sup>2</sup> s1) ( A<sup>3</sup> → A<sup>1</sup> s2) App Ψ AA<sup>1</sup> → A<sup>2</sup> Ψ τ Ψ A[a → τ ] A<sup>1</sup> → A<sup>2</sup> Ψ ∀a.A A<sup>1</sup> → A<sup>2</sup> M-Forall Ψ (A<sup>1</sup> → A2) (A<sup>1</sup> → A2) M-Arr Ψ -- → -M-Unknown

**Fig. 8.** Declarative typing

### **4 Gradually Typed Implicit Polymorphism**

In Sect. 3 we introduced the consistent subtyping relation that accommodates polymorphic types. In this section we continue with the development by giving a declarative type system for predicative implicit polymorphism that employs the consistent subtyping relation. The declarative system itself is already quite interesting as it is equipped with both higher-rank polymorphism and the unknown type. The syntax of expressions in the declarative system is given below:

Expressions e ::= x | n | λx : A. e | λx. e | e e

#### **4.1 Typing in Detail**

Figure 8 gives the typing rules for our declarative system (the reader is advised to ignore the gray-shaded parts for now). Rule Var extracts the type of the variable from the typing context. Rule Nat always infers integer types. Rule LamAnn puts x with type annotation A into the context, and continues type checking the body e. Rule Lam assigns a monotype τ to x, and continues type checking the body e. Gradual types and polymorphic types are introduced via annotations explicitly. Rule Gen puts a fresh type variable a into the type context and generalizes the typing result <sup>A</sup> to <sup>∀</sup>a.A. Rule App first infers the type of <sup>e</sup>1, then the matching judgment Ψ AA<sup>1</sup> → A<sup>2</sup> extracts the domain type A<sup>1</sup> and the codomain type A<sup>2</sup> from type A. The type A<sup>3</sup> of the argument e<sup>2</sup> is then compared with A<sup>1</sup> using the consistent subtyping judgment.

*Matching.* The matching judgment of Siek et al. [25] can be extended to polymorphic types naturally, resulting in <sup>Ψ</sup> AA<sup>1</sup> <sup>→</sup> <sup>A</sup>2. In M-Forall, a monotype τ is guessed to instantiate the universal quantifier a. This rule is inspired by the *application judgment* Φ A • e ⇒ C [11], which says that if we apply a term of type A to an argument e, we get something of type C. If A is a polymorphic type, the judgment works by guessing instantiations until it reaches an arrow type. Matching further simplifies the application judgment, since it is independent of typing. Rule M-Arr and M-Unknown are the same as Siek et al. [25]. M-Arr returns the domain type A<sup>1</sup> and range type A<sup>2</sup> as expected. If the input is , then M-Unknown returns as both the type for the domain and the range.

Note that matching saves us from having a subsumption rule (Sub in Fig. 2). the subsumption rule is incompatible with consistent subtyping, since the latter is not transitive. A discussion of a subsumption rule based on normal subtyping can be found in the appendix.

### **4.2 Type-Directed Translation**

We give the dynamic semantics of our language by translating it to λB. Below we show a subset of the terms in λB that are used in the translation:

$$\text{Terms} \quad s ::= x \mid n \mid \lambda x : A. \ s \mid A\\a. s \mid s\_1 \ s\_2 \mid \langle A \hookrightarrow B \rangle \ s$$

A cast A → B s converts the value of term s from type A to type B. A cast from A to B is permitted only if the types are *compatible*, written A ≺ B, as briefly mentioned in Sect. 3.1. The syntax of types in λB is the same as ours.

The translation is given in the gray-shaded parts in Fig. 8. The only interesting case here is to insert explicit casts in the application rule. Note that there is no need to translate matching or consistent subtyping, instead we insert the source and target types of a cast directly in the translated expressions, thanks to the following two lemmas:

*L***emma 1 ( to** ≺**).** *If* Ψ AA<sup>1</sup> → A2, *then* A ≺ A<sup>1</sup> → A2. *L***emma 2 ( to** ≺**).** *If* Ψ A -B, *then* A ≺ B.

In order to show the correctness of the translation, we prove that our translation always produces well-typed expressions in λB. By Lammas 1 and 2, we have the following theorem:

### *<sup>T</sup>* **heorem 2 (Type Safety).** *If* <sup>Ψ</sup> <sup>e</sup> : <sup>A</sup> <sup>s</sup>, *then* <sup>Ψ</sup> *<sup>B</sup>* <sup>s</sup> : <sup>A</sup>.

*Parametricity.* An important semantic property of polymorphic types is *relational parametricity* [19]. The parametricity property says that all instances of a polymorphic function should behave *uniformly*. A classic example is a function with the type ∀a.a → a. The parametricity property guarantees that a value of this type must be either the identity function (i.e., λx.x) or the undefined function (one which never returns a value). However, with the addition of the unknown type , careful measures are to be taken to ensure parametricity. This is exactly the circumstance that λB was designed to address. Ahmed et al. [2] proved that λB satisfies relational parametricity. Based on their result, and by T heorem 2, parametricity is preserved in our system.

*Ambiguity from Casts.* The translation does not always produce a unique target expression. This is because when we guess a monotype τ in rule M-Forall and CS-ForallL, we could have different choices, which inevitably leads to different types. Unlike (non-gradual) polymorphic type systems [11,18], the choice of monotypes could affect runtime behaviour of the translated programs, since they could appear inside the explicit casts. For example, the following shows two possible translations for the same source expression λx : . f x, where the type of f is instantiated to Int → Int and Bool → Bool, respectively:

$$f: \forall a. a \to a \vdash (\lambda x : \star . f \ x) : \star \to \mathsf{Int}$$

$$\leadsto (\lambda x : \star . (\langle \forall a. a \to a \leftrightarrow \mathsf{Int} \to \mathsf{Int} \rangle \, f) \, (\{\star \hookrightarrow \mathsf{Int}\} \, x))$$

$$f: \forall a. a \to a \vdash (\lambda x : \star . f \ x) : \star \to \mathsf{Bool}$$

$$\leadsto (\lambda x : \star . (\langle \forall a. a \to a \leftrightarrow \mathsf{Bool} \to \mathsf{Bool} \rangle \, f) \, (\langle \star \hookrightarrow \mathsf{Bool} \rangle \, \, x)))$$

If we apply λx : . f x to 3, which is fine since the function can take any input, the first translation runs smoothly in λB, while the second one will raise a cast error (Int cannot be cast to Bool). Similarly, if we apply it to true, then the second succeeds while the first fails. The culprit lies in the highlighted parts where any instantiation of a would be put inside the explicit cast. More generally, any choice introduces an explicit cast to that type in the translation, which causes a runtime cast error if the function is applied to a value whose type does not match the guessed type. Note that this does not compromise the type safety of the translated expressions, since cast errors are part of the type safety guarantees.

*Coherence.* The ambiguity of translation seems to imply that the declarative system is *incoherent*. A semantics is coherent if distinct typing derivations of the same typing judgment possess the same meaning [20]. We argue that the declarative system is "coherent up to cast errors" in the sense that a well-typed program produces a unique value, or results in a cast error. In the above example, whatever the translation might be, applying λx : . f x to 3 either results in a cast error, or produces 3, nothing else.

This discrepancy is due to the guessing nature of the *declarative* system. As far as the declarative system is concerned, both Int → Int and Bool → Bool are equally acceptable. But this is not the case at runtime. The acute reader may have found that the *only* appropriate choice is to instantiate f to → . However, as specified by rule M-Forall in Fig. 8, we can only instantiate type variables to monotypes, but is *not* a monotype! We will get back to this issue in Sect. 6.2 after we present the corresponding algorithmic system in Sect. 5.

#### **4.3 Correctness Criteria**

Siek et al. [25] present a set of properties that a well-designed gradual typing calculus must have, which they call the refined criteria. Among all the criteria, those related to the static aspects of gradual typing are well summarized by Cimini and Siek [8]. Here we review those criteria and adapt them to our notation. We have proved in Coq that our type system satisfies all these criteria.

### *L***emma 3 (Correctness Criteria)**

	- *if* <sup>Ψ</sup> *OL* <sup>e</sup> : <sup>A</sup>, *then there exists* <sup>B</sup>, *such that* <sup>Ψ</sup> <sup>e</sup> : <sup>B</sup>, *and* <sup>Ψ</sup> B <: <sup>A</sup>.
	- *if* <sup>Ψ</sup> <sup>e</sup> : <sup>A</sup>, *then* <sup>Ψ</sup> *OL* <sup>e</sup> : <sup>A</sup>

The first criterion states that the gradual type system should be a conservative extension of the original system. In other words, a *static* program that is typeable in the Odersky-L¨aufer type system if and only if it is typeable in the gradual type system. A static program is one that does not contain any type <sup>7</sup>. However since our gradual type system does not have the subsumption rule, it produces more general types.

The second criterion states that if a typeable expression loses some type information, it remains typeable. This criterion depends on the definition of the precision relation, written A  B, which is given in the appendix. The relation intuitively captures a notion of types containing more or less unknown types (). The precision relation over types lifts to programs, i.e., e<sup>1</sup>  e<sup>2</sup> means that e<sup>1</sup> and e<sup>2</sup> are the same program except that e<sup>2</sup> has more unknown types.

The first two criteria are fundamental to gradual typing. They explain for example why these two programs (λx : Int. x + 1) and (λx : . x + 1) are typeable, as the former is typeable in the Odersky-L¨aufer type system and the latter is a less-precise version of it.

The last two criteria relate the compilation to the cast calculus. The third criterion is essentially the same as T heorem 2, given that a target expression should always exist, which can be easily seen from Fig. 8. The last criterion ensures that the translation must be monotonic over the precision relation .

As for the dynamic guarantee, things become a bit murky for two reasons: (1) as we discussed before, our declarative system is incoherent in that the runtime behaviour of the same source program can vary depending on the particular translation; (2) it is still unknown whether dynamic guarantee holds in λB. We will have more discussion on the dynamic guarantee in Sect. 6.3.

### **5 Algorithmic Type System**

In this section we give a bidirectional account of the algorithmic type system that implements the declarative specification. The algorithm is largely inspired by the

<sup>7</sup> Note that the term *static* has appeared several times with different meanings.


**Fig. 9.** Syntax of the algorithmic system

$$\underline{\underline{\underline{\underline{\overline{\vdash}}}}} \vdash A \lesssim B \dashv \Delta$$

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash A \stackrel{\sim}{\sim} B \rightarrow \Delta \\\\ \hline \Gamma[a] \vdash a \stackrel{\sim}{\sim} a \vdash \Gamma[a] \end{array} \text{ACS-Tvar} \qquad \qquad \qquad \qquad \begin{array}{c} \Gamma[\widehat{a}] \vdash \widehat{a} \stackrel{\sim}{\sim} \widehat{a} \vdash \Gamma[\widehat{a}] \end{array} \text{ACS-ExVaR} \\\\ \Gamma \vdash \mathsf{Int} \stackrel{\sim}{\lesssim} \mathsf{Int} \stackrel{\sim}{\sim} \Gamma \stackrel{\text{ACS-INT}}{\to} \Gamma \\\\ \frac{\Gamma \vdash B\_{1} \stackrel{\sim}{\sim} A\_{1} \dashv \Theta}{\Gamma \vdash A\_{1} \to A\_{2} \stackrel{\sim}{\sim} B\_{1} \to B\_{2} \vdash \Delta} \\\\ \frac{\Gamma, a \vdash A \stackrel{\sim}{\leq} B \rightarrow \Delta, a, \Theta}{\Gamma \vdash A \stackrel{\sim}{\sim} \zeta a.B \stackrel{\sim}{\dash} \Delta} \stackrel{\text{ACS-F03LLR}}{\text{ACS-F04L}} \qquad \qquad \qquad \qquad \dfrac{\Gamma, \hat{a} \vdash A[a \stackrel{\sim}{\sim} \hat{a}] \stackrel{\sim}{\sim} B \stackrel{\sim}{\dash} \Delta}{\Gamma \vdash A \stackrel{\sim}{\sim} A \stackrel{\sim}{\sim} B \stackrel{\sim}{\dash} \Delta} \\\\ \hat{a} \not\notin \mathcal{f} v(A) \qquad \qquad \Gamma[\hat{a}] \vdash \hat{a} \stackrel{\sim}{\stackrel{\sim}{\sim} A \stackrel{\sim}{\sim} A} \stackrel{\sim}{\dash} \Delta \\\\ \Gamma[\hat{a}] \vdash \hat{a} \stackrel{\sim}{\stackrel{\sim}{\sim} A \stackrel{\sim}{\sim} A} \stackrel{\sim}{\dash$$

**Fig. 10.** Algorithmic consistent subtyping

algorithmic bidirectional system of Dunfield and Krishnaswami [11] (henceforth DK system). However our algorithmic system differs from theirs in three aspects: (1) the addition of the unknown type ; (2) the use of the matching judgment; and (3) the approach of *gradual inference only producing static types* [12]. We then prove that our algorithm is both sound and complete with respect to the declarative type system. Full proofs can be found in the appendix. 

*Algorithmic Contexts.* The algorithmic context Γ is an *ordered* list containing declarations of type variables a and term variables x : A. Unlike declarative contexts, algorithmic contexts also contain declarations of existential type variables <sup>a</sup>, which can be either unsolved (written a) or solved to some monotype (written <sup>a</sup> <sup>=</sup> <sup>τ</sup> ). Complete contexts <sup>Ω</sup> are those that contain no unsolved existential type variables. Figure 9 shows the syntax of the algorithmic system. Apart from expressions in the declarative system, we have annotated expressions e : A.

### **5.1 Algorithmic Consistent Subtyping and Instantiation**

Figure 10 shows the algorithmic consistent subtyping rules. The first five rules do not manipulate contexts. Rule ACS-Fun is a natural extension of its declarative counterpart. The output context of the first premise is used by the second

$$
\boxed{I \vdash \widehat{a} \lessapprox A \dashv \Delta}
$$

$$\begin{array}{c} \Gamma \vdash \tau \\ \hline \Gamma, \widehat{\widehat{a}}, \Gamma' \vdash \widehat{a} \mathop{\mathop{\mathop{\mathop{\pi}}}{\mathop{\pi}}} \widehat{\pi} \vdash \Gamma, \widehat{\widehat{a}} = \tau, \Gamma' \end{array} \quad \begin{array}{c} \text{IssrıL.Sowe} \\ \Gamma[\widehat{a}][\widehat{b}] \vdash \widehat{a} \mathop{\mathop{\mathop{\pi}}} \widehat{b} \vdash \Gamma[\widehat{a}][\widehat{b} = \widehat{a}] \\ \Gamma[\widehat{a}][\widehat{b}] \vdash \widehat{a} \mathop{\mathop{\pi}} \widehat{b} \vdash \Gamma[\widehat{a}][\widehat{b} = \widehat{a}] \\ \Gamma[\widehat{a}] \vdash \widehat{a} \mathop{\mathop{\pi}} \widehat{b} \end{array} \quad \begin{array}{c} \Gamma[\widehat{a}], b \vdash \widehat{a} \mathop{\widehat{z}} \mathop{\widehat{b} \vdash \Gamma[\widehat{a}][\widehat{b} = \widehat{a}] \\ \Gamma[\widehat{a}][\widehat{b} \vdash \widehat{a} \\ \Gamma[\widehat{a}] \mathop{\pi} \end{array} \text{IssrıL.Rhacıc} \\ \hline \begin{array}{c} \Gamma[\widehat{a}\_{2}, \widehat{a}\_{1}, \widehat{a} = \widehat{a}\_{1} \to \widehat{a}\_{2}] \vdash \Gamma \widehat{a} \mathop{\} \mathop{\pi} \end{array} \text{IssrıL.Rhacıc} \\ \Gamma[\widehat{a}\_{2}, \widehat{a}\_{1}, \widehat{a} = \widehat{a}\_{1} \to \widehat{a}\_{2}] \vdash \Gamma \widehat{a}\_{1} \mathop{\pi} \mathop{\} \mathop{\pi} \mathop{\} \mathop{\}^{\text{Issr}} \text{IssrıL.Rha} \\ \Gamma[\widehat{a}][\widehat{a}] \vdash \widehat{a} \mathop$$

#### **Fig. 11.** Algorithmic instantiation

premise, and the output context of the second premise is the output context of the conclusion. Note that we do not simply check A<sup>2</sup> - B2, but apply Θ to both types (e.g., [Θ]A2). This is to maintain an important invariant that types are fully applied under input context Γ (they contain no existential variables already solved in Γ). The same invariant applies to every algorithmic judgment. Rule ACS-ForallR looks similar to its declarative counterpart, except that we need to drop the trailing context a, Θ from the concluding output context since they become out of scope. Rule ACS-ForallL generates a fresh existential variable a, and replaces <sup>a</sup> with <sup>a</sup> in the body <sup>A</sup>. The new existential variable a is then added to the premise's input context. As a side note, when both types are quantifiers, then either ACS-ForallR or ACS-ForallR could be tried. In practice, one can apply ACS-ForallR eagerly. The last two rules together check consistent subtyping with an unsolved existential variable on one side and an arbitrary type on the other side by the help of the instantiation judgment. The judgment <sup>Γ</sup> <sup>a</sup> <sup>A</sup> <sup>Δ</sup> defined in Fig. <sup>11</sup> instantiates unsolved existential variables. Judgment <sup>a</sup> <sup>A</sup> reads "instantiate <sup>a</sup> to a consistent subtype

of <sup>A</sup>". For space reasons, we omit its symmetric judgement <sup>Γ</sup> <sup>A</sup> <sup>a</sup> <sup>Δ</sup>. Rule InstLSolve and rule InstLReach set <sup>a</sup> to <sup>τ</sup> and <sup>b</sup> in the output context, respectively. Rule InstLSolveU is similar to ACS-UnknownR in that we put no constraint on <sup>a</sup> when it meets the unknown type . This design decision reflects the point that type inference only produces static types [12]. We will get back to this point in Sect. 6.2. Rule InstLAllR is the instantiation version of rule ACS-ForallR. The last rule InstLArr applies when <sup>a</sup> meets a function type. It follows that the solution must also be a function type. That is why, in the first premise, we generate two fresh existential variables a<sup>1</sup> and a2, and insert them just before <sup>a</sup> in the input context, so that the solution of <sup>a</sup> can mention them. Note that <sup>A</sup><sup>1</sup> a<sup>1</sup> switches to the other instantiation judgment.

#### **5.2 Algorithmic Typing**

We now turn to the algorithmic typing rules in Fig. 12. The algorithmic system uses bidirectional type checking to accommodate polymorphism. Most of

$$\begin{array}{c} \left[\varproj} \left[\varproj} \left[\varproj} \exists \Vdash e \xrightarrow{\begin{array}{c} \alpha \xrightarrow{\begin{array}{c} \alpha \xrightarrow{\begin{array}{c} \alpha \end{array}} \end{array}} \right] \right. \\\\ \left[\varproj} \varproj} \left(x:A\right) \in \varGamma \quad \text{A}\text{V} \text{a} \qquad \qquad \qquad \Gamma \vdash \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \!$$

them are quite standard. Perhaps rule AApp (which differs significantly from that in the DK system) deserves attention. It relies on the algorithmic matching judgment <sup>Γ</sup> AA<sup>1</sup> <sup>→</sup> <sup>A</sup><sup>2</sup> <sup>Δ</sup>. Rule AM-ForallL replaces <sup>a</sup> with a fresh existential variable a, thus eliminating guessing. Rule AM-Arr and AM-Unknown correspond directly to the declarative rules. Rule AM-Var, which has no corresponding declarative version, is similar to InstRArr/InstLArr: we create <sup>a</sup> and <sup>b</sup> and add <sup>c</sup> <sup>=</sup> <sup>a</sup> <sup>→</sup> <sup>b</sup> to the context.

### **5.3 Completeness and Soundness**

We prove that the algorithmic rules are sound and complete with respect to the declarative specifications. We need an auxiliary judgment Γ −→ Δ that captures a notion of information increase from input contexts Γ to output contexts Δ [11]. *Soundness.* Roughly speaking, soundness of the algorithmic system says that given an expression e that type checks in the algorithmic system, there exists a corresponding expression e that type checks in the declarative system. However there is one complication: e does not necessarily have more annotations than e . For example, by ALam we have λx. x ⇐ (∀a.a) <sup>→</sup> (∀a.a), but λx. x itself cannot have type (∀a.a) → (∀a.a) in the declarative system. To circumvent that, we add an annotation to the lambda abstraction, resulting in λx : (∀a.a). x, which is typeable in the declarative system with the same type. To relate λx. x and λx : (∀a.a). x, we erase all annotations on both expressions. The definition of erasure · is standard and thus omitted.

### **Theorem 1 (Soundness of Algorithmic Typing).** *Given* Δ −→ Ω*,*

*1. If* Γ e ⇒ A Δ *then* ∃e *such that* [Ω]Δ e : [Ω]A *and* e = e *. 2. If* Γ e ⇐ A Δ *then* ∃e *such that* [Ω]Δ e : [Ω]A *and* e = e *.*

*Completeness.* Completeness of the algorithmic system is the reverse of soundness: given a declarative judgment of the form [Ω]Γ [Ω] ... , we want to get an algorithmic derivation of Γ ··· Δ. It turns out that completeness is a bit trickier to state in that the algorithmic rules generate existential variables on the fly, so Δ could contain unsolved existential variables that are not found in Γ, nor in Ω. Therefore the completeness proof must produce another complete context Ω that extends both the output context Δ, and the given complete context Ω. As with soundness, we need erasure to relate both expressions.

**Theorem 2 (Completeness of Algorithmic Typing).** *Given* Γ −→ Ω *and* Γ A*, if* [Ω]Γ e : A *then there exist* Δ*,* Ω *,* A *and* e *such that* Δ −→ Ω *and* Ω −→ Ω *and* Γ e ⇒ A Δ *and* A = [Ω ]A *and* e = e *.*

### **6 Discussion**

### **6.1 Top Types**

To demonstrate that our definition of consistent subtyping (Definition 2) is applicable to other features, we show how to extend our approach to Top types with all the desired properties preserved.

In order to preserve the orthogonality between subtyping and consistency, we require to be a common supertype of all static types, as shown in rule S-Top. This rule might seem strange at first glance, since even if we remove the requirement A static, the rule seems reasonable. However, an important point is that because of the orthogonality between subtyping and consistency, subtyping itself should not contain a potential information loss! Therefore, subtyping instances such as <: are not allowed. For consistency, we add the rule that is consistent with , which is actually included in the original reflexive rule A ∼ A. For consistent subtyping, every type is a consistent subtype of , for example, Int → -.

$$\frac{A\text{ static}}{\Psi \vdash A < \text{:} \top} \text{s-Top} \qquad \qquad \top \sim \top \qquad \qquad \qquad \overline{\Psi \vdash A \lesssim \top} \text{CS-Top}$$

It is easy to verify that Definition 2 is still equivalent to that in Fig. 7 extended with rule CS-Top. That is, <sup>T</sup> heorem <sup>1</sup> holds:

**Proposition 4 (Extension with ).** Ψ A - B ⇔ Ψ A <: C*,* C ∼ D*,* Ψ D <: B*, for some* C, D*.*

We extend the definition of concretization (Definition 3) with by adding another equation γ() = {}. Note that Castagna and Lanvin [7] also have this equation in their calculus. It is easy to verify that Proposition 2 still holds: -

#### **Proposition 5 (Equivalent to AGT on ).** A - B *if only if* A < : B*.*

*Siek and Taha's* [22] *Definition of Consistent Subtyping Does Not Work for* . As the analysis in Sect. 3.2, Int → - only holds when we first apply consistency, then subtyping. However we cannot find a type A such that Int → <: A and A ∼ . Also we have a similar problem in extending the restriction operator: *non-structural* masking between Int → and cannot be easily achieved.

### **6.2 Interpretation of the Dynamic Semantics**

In Sect. 4.2 we have seen an example where a source expression could produce two different target expressions with different runtime behaviour. As we explained, this is due to the guessing nature of the declarative system, and from the typing point of view, no type is particularly better than others. However, in practice, this is not desirable. Let us revisit the same example, now from the algorithmic point of view (we omit the translation for space reasons): <sup>f</sup> : <sup>∀</sup>a.a <sup>→</sup> <sup>a</sup> (λx : . f x) <sup>⇒</sup> <sup>→</sup> <sup>a</sup> <sup>f</sup> : <sup>∀</sup>a.a <sup>→</sup> a, <sup>a</sup>

$$f: \forall a. a \to a \vdash (\lambda x: \star. f \; x) \Rightarrow \star \to \widehat{a} \vdash f: \forall a. a \to a, \widehat{a} \; \|$$

Compared with declarative typing, which produces many types ( → Int, → Bool, and so on), the algorithm computes the type <sup>→</sup> <sup>a</sup> with <sup>a</sup> unsolved in the output context. What can we know from the output context? The only thing we know is that <sup>a</sup> is not constrained at all! However, it is possible to make a more refined distinction between different kinds of existential variables. The first kind of existential variables are those that indeed have no constraints at all, as they do not affect the dynamic semantics. The second kind of existential variables (as in this example) are those where the only constraint is that *the variable was once compared with an unknown type* [12].

To emphasize the difference and have better support for dynamic semantics, we could have *gradual variables* in addition to existential variables, with the difference that only unsolved gradual variables are allowed to be unified with the unknown type. An irreversible transition from existential variables to gradual variables occurs when an existential variable is compared with . After the algorithm terminates, we can set all unsolved existential variables to be any (static) type (or more precisely, as Garcia and Cimini [12], with *static type parameters*), and all unsolved gradual variables to be (or *gradual type parameters*). However, this approach requires a more sophisticated declarative/algorithmic type system than the ones presented in this paper, where we only produce static monotypes in type inference. We believe this is a typical trade-off in existing gradual type systems with inference [12,23]. Here we suppress the complexity of dynamic semantics in favour of the conciseness of static typing.

### **6.3 The Dynamic Guarantee**

In Sect. 4.3 we mentioned that the dynamic guarantee is closely related to the coherence issue. To aid discussion, we first give the definition of dynamic guarantee as follows:

**Definition 5 (Dynamic guarantee).** *Suppose* e  e*,* ∅ e : A s *and* ∅ e : A s *, if* s ⇓ v*, then* s ⇓ v *and* v  v*.*

The dynamic guarantee says that if a gradually typed program evaluates to a value, then removing type annotations always produces a program that evaluates to an equivalent value (modulo type annotations). Now apparently the coherence issue of the declarative system breaks the dynamic guarantee. For instance:

(λf : ∀a.a → a. λx : Int.fx) (λx. x)3 (λf : ∀a.a → a. λx : . f x) (λx. x) 3

The left one evaluates to 3, whereas its less precise version (right) will give a cast error if a is instantiated to Bool for example.

As discussed in Sect. 6.2, we could design a more sophisticated declarative/algorithmic type system where coherence is retained. However, even with a coherent source language, the dynamic guarantee is still a question. Currently, the dynamic guarantee for our target language λB is still an open question. According to Igarashi et al. [14], the difficulty lies in the definition of term precision that preserves the semantics.

### **7 Related Work**

Along the way we discussed some of the most relevant work to motivate, compare and promote our gradual typing design. In what follows, we briefly discuss related work on gradual typing and polymorphism.

*Gradual Typing.* The seminal paper by Siek and Taha [21] is the first to propose gradual typing. The original proposal extends the simply typed lambda calculus by introducing the unknown type and replacing type equality with type consistency. Later Siek and Taha [22] incorporated gradual typing into a simple object oriented language, and showed that subtyping and consistency are orthogonal – an insight that partly inspired our work. We show that subtyping and consistency are orthogonal in a much richer type system with higher-rank polymorphism. Siek et al. [25] proposed a set of criteria that provides important guidelines for designers of gradually typed languages. Cimini and Siek [8] introduced the *Gradualizer*, a general methodology for generating gradual type systems from static type systems. Later they also develop an algorithm to generate dynamic semantics [9]. Garcia et al. [13] introduced the AGT approach based on abstract interpretation.

*Gradual Type Systems with Explicit Polymorphism.* Ahmed et al. [1] proposed λB that extends the blame calculus [29] to incorporate polymorphism. The key novelty of their work is to use dynamic sealing to enforce parametricity. Devriese et al. [10] proved that embedding of System F terms into λB is not fully abstract. Igarashi et al. [14] also studied integrating gradual typing with parametric polymorphism. They proposed System F*G*, a gradually typed extension of System F, and System F*<sup>C</sup>* , a new polymorphic blame calculus. As has been discussed extensively, their definition of type consistency does not apply to our setting (implicit polymorphism). All of these approaches mix consistency with subtyping to some extent, which we argue should be orthogonal.

*Gradual Type Inference.* Siek and Vachharajani [23] studied unification-based type inference for gradual typing, where they show why three straightforward approaches fail to meet their design goals. Their type system infers gradual types, which results in a complicated type system and inference algorithm. Garcia and Cimini [12] presented a new approach where gradual type inference only produces static types, which is adopted in our type system. They also deal with let-polymorphism (rank 1 types). However none of these works deals with higherranked implicit polymorphism.

*Higher-Rank Implicit Polymorphism.* Odersky and L¨aufer [17] introduced a type system for higher-rank types. Based on that, Peyton Jones et al. [18] developed an approach for type checking higher-rank predicative polymorphism. Dunfield and Krishnaswami [11] proposed a bidirectional account of higher-rank polymorphism, and an algorithm for implementing the declarative system, which serves as a sole inspiration for our algorithmic system. The key difference, however, is the integration of gradual typing. Vytiniotis et al. [28] defers static type errors to runtime, which is fundamentally different from gradual typing, where programmers can control over static or runtime checks by precision of the annotations.

### **8 Conclusion**

In this paper, we present a generalized definition of consistent subtyping, which is proved to be applicable to both polymorphic and top types. Based on the new definition of consistent subtyping, we have developed a gradually typed calculus with predicative implicit higher-rank polymorphism, and an algorithm to implement it. As future work, we are interested to investigate if our results can scale to real world languages and other programming language features.

**Acknowledgements.** We thank Ronald Garcia and the anonymous reviewers for their helpful comments. This work has been sponsored by the Hong Kong Research Grant Council projects number 17210617 and 17258816.

### **References**


30 N. Xie et al.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **HOBiT: Programming Lenses Without Using Lens Combinators**

Kazutaka Matsuda1(B) and Meng Wang<sup>2</sup>

<sup>1</sup> Tohoku University, Sendai 980-8579, Japan kztk@ecei.tohoku.ac.jp <sup>2</sup> University of Bristol, Bristol BS8 1TH, UK

**Abstract.** We propose HOBiT, a higher-order bidirectional programming language, in which users can write bidirectional programs in the familiar style of conventional functional programming, while enjoying the full expressiveness of lenses. A bidirectional transformation, or a lens, is a pair of mappings between source and view data objects, one in each direction. When the view is modified, the source is updated accordingly with respect to some laws—a pattern that is found in databases, modeldriven development, compiler construction, and so on. The most common way of programming lenses is with lens combinators, which are lens-tolens functions that compose simpler lenses to form more complex ones. Lens combinators preserve the bidirectionality of lenses and are expressive; but they compel programmers to a specialised point-free style—i.e., no naming of intermediate computation results—limiting the scalability of bidirectional programming. To address this issue, we propose a new bidirectional programming language HOBiT, in which lenses are represented as standard functions, and combinators are mapped to language constructs with binders. This design transforms bidirectional programming, enabling programmers to write bidirectional programs in a flexible functional style and at the same time access the full expressiveness of lenses. We formally define the syntax, type system, and the semantics of the language, and then show that programs in HOBiT satisfy bidirectionality. Additionally, we demonstrate HOBiT's programmability with examples.

### **1 Introduction**

Transforming data from one format to another is a common task of programming: compilers transform program texts into syntax trees, manipulate the trees and then generate low-level code; database queries transform base relations into views; model transformations generate lower-level implementations from higherlevel models; and so on. Very often, such transformations will benefit from being bidirectional, allowing changes to the targets to be mapped back to the sources too. For example, if one can run a compiler front-end (preprocessing, parsing, desugaring, etc.) backwards, then all sorts of program analysis tools will be able to focus on a much smaller core language, without sacrificing usability, as

their outputs in term of the core language will be transformed backwards to the source language. In the same way, such needs arise in databases (the *view-update problem* [1,6,12]) and model-driven engineering (bidirectional model transformation) [28,33,35].

As a response to this challenge, programming language researchers have started to design languages that execute deterministically in both directions, and the lens framework is the most prominent among all. In the lens framework, a *bidirectional transformation* (or a *lens*) - ∈ *Lens* S V , consists of *get* - ∈ S → V , and *put* - ∈ S → V → S [3,7,8]. (When clear from the context, or unimportant, we sometimes omit the lens name and write simply *get*/*put*.) Function *get* extracts a view from a source, and *put* takes both an updated view and the original source as inputs to produce an updated source. The additional parameter of *put* makes it possible to recover some of the source data that is not present in the view. In other words, *get* needs not to be injective to have a *put*. Not all pairs of *get*/*put* are considered correct lenses. The following round-triping laws of a lens are generally required to establish bidirectionality:

$$\begin{aligned} \text{put } \ell \ s \ v &= s & \text{if} \quad \text{get } \ell \ s &= v & (\mathbf{Acteptability})\\ \text{get } \ell \ s' &= v & \text{if} \quad \text{put } \ell \ s \ v &= s' & (\mathbf{Consistency}) \end{aligned}$$

for all *s*, *s* and *v*. (In this paper we write *e* = *e* with the assumption that neither *e* nor *e* is undefined. Stronger variants of the laws enforcing totality exist elsewhere, for example in [7].) Here *consistency* ensures that all updates on a view are captured by the updated source, and *acceptability* prohibits changes to the source if no update has been made on the view. Collectively, the two laws defines *well-behavedness* [1,7,12].

The most common way of programming lenses is with lens combinators [3,7,8], which are basically a selection of lens-to-lens functions that compose simpler lenses to form more complex ones. This combinator-based approach follows the long history of lightweight language development in functional programming. The distinctive advantage of this approach is that by restricting the lens language to a few selected combinators, well-behavedness can be more easily preserved in programming, and therefore given well-behaved lenses as inputs, the combinators are guaranteed to produce well-behaved lenses. This idea of lens combinators is very influential academically, and various designs and implementations have been proposed [2,3,7–9,16,17,27,32] over the years.

#### **1.1 The Challenge of Programmability**

The complexity of a piece of software can be classified as either intrinsic or accidental. Intrinsic complexity reflects the inherent difficulty of the problem at hand, whereas accidental complexity arises from the particular programming language, design or tools used to implement the solution. This work aims at reducing the accidental complexity of bidirectional programming by contributing to the design of bidirectional languages. In particularly, we identify a language restriction—i.e., no naming of intermediate computation results—which complicates lens programming, and propose a new design that removes it.

As a teaser to demonstrate the problem, let us consider the list append function. In standard unidirectional programming, it can be defined simply as *append* x y = **case** x **of** {[ ] → y; a : x → a : *append* x y}. Astute readers may have already noticed that *append* is defined by structural recursion on x, which can be made explicit by using *foldr* as in *append* x y = *foldr* (:) y x.

But in a lens language based on combinators, things are more difficult. Specifically, *append* now requires a more complicated recursion pattern, as below.

```
appendL :: Lens ([A], [A]) [A]
appendL =
  cond idL (λ .True) (λ .λ .[ ]) (consL ˆ◦ (idL × appendL)) (not ◦ null ) (λ .λ .⊥)
  ˆ◦ rearr ˆ◦ (outListL × idL)
 where outListL :: Lens [A] (Either () (A, [A]))
         rearr :: Lens (Either () (a, b), c) (Either c (a, (b, c)))
         (ˆ◦) :: Lens b c → Lens a b → Lens a c
         cond :: Lens a c → ... → Lens b c → ... → Lens (Either a b) c
         ...
```
It is beyond the scope of this paper to explain how exactly the definition of *appendL* works, as its obscurity is what this work aims to remove. Instead, we informally describe its behaviour and the various components of the code. The above code defines a lens: forwards, it behaves as the standard *append*, and backwards, it splits the updated view list, and when the length of the list changes, this definition implements (with the grayed part) the bias of keeping the length of the first source list whenever possible (to disambiguate multiple candidate source changes). Here, *cond*, (ˆ◦), etc. are lens combinators and *outListL* and *rearr* are auxiliary lenses, as can be seen from their types. Unlike its unidirectional counterpart, *appendL* can no longer be defined as a structural recursion on list; instead it traverses a pair of lists with rather complex rearrangement *rearr* .

Intuitively, the additional grayed parts is intrinsic complexity, as they are needed for directing backwards execution. However, the complicated recursion scheme, which is a direct result of the underlying limitation of lens languages, is certainly accidental. Recall that in the definition of *append*, we were able to use the variable *y*, which is bound outside of the recursion pattern, inside the body of *foldr* . But the same is not possible with lens combinators which are strictly 'pointfree'. Moreover, even if one could name such variables (points), their usage with lens combinators will be very restricted in order to guarantee well-behavedness [21,23]. This problem is specific to opaque non-function objects such as lenses, and goes well beyond the traditional issues associated with the pointfree programming style.

In this paper, we design a new bidirectional language HOBiT, which aims to remove much of the accidental difficulty found in combinator-based lens programming, and reduces the gap between bidirectional programming and standard functional programming. For example, the following definition in HOBiT implements the same lens as *appendL*.

$$\begin{array}{l} \mathit{append} :: \mathbf{B}[A] \to \mathbf{B}[A] \to \mathbf{B}[A] \\ \mathit{append} B \ x \ y = \mathtt{case} \ x \ \underline{\mathtt{of}} \ [] \end{array} \begin{array}{l} \mathit{add} \\ \mathit{a} : x' \rightarrow \underline{\mathtt{with}} \ \lambda \ldots \mathtt{True} \ \underline{\mathtt{by}} \ (\lambda \ldots \lambda \ldots []) \\ \mathit{a} : x' \rightarrow \underline{a} \ \underline{\mathtt{append}} B \ x' \ y \ \underline{\mathtt{with}} \ n \textit{to} \ n \textit{un} \ \underline{\mathtt{by}} \ (\lambda \ldots \lambda \ldots \lambda) \end{array}$$

As expected, the above code shares the grayed part with the definition of *appendL* as the two implement the same backwards behaviour. The difference is that *appendB* uses structural recursion in the same way as the standard unidirectional *append*, greatly simplifying programming. This is made possible by the HOBiT's type system and semantics, allowing unrestricted use of free variables. This difference in approach is also reflected in the types: *appendB* is a proper function (instead of the abstract lens type of *appendL*), which readily lends itself to conventional functional programming. At the same time, *appendB* is also a proper lens, which when executed by the HOBiT interpreter behave exactly like *appendL*. A major technical challenge in the design of HOBiT is to guarantee this duality, so that functions like *appendB* are well-behaved by construction despite the flexibility in their construction.

### **1.2 Contributions**

As we can already see from the very simple example above, the use of HOBiT simplifies bidirectional programming by removing much of the accidental complexity. Specifically, HOBiT stands out from existing bidirectional languages in two ways:


Thanks to these distinctive advantages, HOBiT for the first time allows us to construct realistically-sized bidirectional programs with relative ease. Of course, this does not mean free lunch: the ability to control backwards behaviours will not magically come without additional code (for example the grayed part above). What HOBiT achieves is that programming effort may now focus on the productive part of specifying backwards behaviours, instead of being consumed by circumventing language restrictions.

In summary, we make the following contributions in this paper.

– We design a higher-order bidirectional programming language HOBiT, which supports convenient bidirectional programming with control of backwards behaviours (Sect. 3). We also discuss several extensions to the language (Sect. 5).


### **2 Overview: Bidirectional Programming Without Combinators**

In this section, we informally introduce the essential constructs of HOBiT and demonstrate their use by a few small examples. Recall that, as seen in the *appendB* example, the strength of HOBiT lies in allowing programmers to access λ-abstractions without restrictions on the use of λ-bound variables.

### **2.1 The case Construct**

The most important language construct in HOBiT is **case** (pronounced as *bidirectional case*), which provides pattern matching and easy access to bidirectional branching, and also importantly, allows unrestricted use of λ-bound variables.

In general, a **case** expression has the following form.

$$\underline{\mathbf{c}} \underline{\mathbf{a}} \underline{\mathbf{e}} \; e \; \underline{\mathbf{f}} \; \{p\_1 \to e\_1 \; \underline{\mathbf{with}} \; \phi\_1 \; \underline{\mathbf{b}} \; \underline{\mathbf{y}} \; \rho\_1; \ldots; p\_n \to e\_n \; \underline{\mathbf{with}} \; \phi\_n \; \underline{\mathbf{b}} \; \underline{\mathbf{y}} \; \rho\_n\} $$

(Like Haskell, we shall omit "{", "}" and ";" if they are clear from the layout.) In the type system of HOBiT, a **case**-expression has type **B**B, if e and e<sup>i</sup> have types **B**A and **B**B, and φ<sup>i</sup> and ρ<sup>i</sup> have types B → *Bool* and A → B → A, where A and B contains neither (→) nor **B**. The type **B**A can be understood intuitively as "updatable A". Typically, the source and view data are given such **B**-types, and a function of type **B**A → **B**B is the HOBiT equivalent of *Lens* A B.

The pattern matching part of **case** performs two implicit operations: it first unwraps the **B**-typed value, exposing its content for normal pattern matching, and then it wraps the variables bound by the pattern matching, turning them into 'updatable' **B**-typed values to be used in the bodies. For example, in the second branch of *appendB*, a and x can be seen as having types A and [A] in the pattern, but **B**A and **B**[A] types in the body; and the bidirectional constructor (:) :: **B**A → **B**[A] → **B**[A] combines them to produce a **B**-typed list.

In addition to the standard conditional branches, **case**-expression has two unique components φ<sup>i</sup> and ρ<sup>i</sup> called *exit conditions* and *reconciliation functions* respectively, which are used in backwards executions. Exit condition φ<sup>i</sup> is an over-approximation of the forwards-execution results of the expressions ei. In other words, if branch i is choosen, then φ<sup>i</sup> e<sup>i</sup> must evaluate to True. This assertion is checked dynamically in HOBiT, though could be checked statically with

a sophisticated type system [7]. In the backwards direction the exit condition is used for deciding branching: the branch with its exit condition satisfied by the updated view (when more than one match, the original branch used in the forwards direction has higher priority) will be picked for execution. The idea is that due to the update in the view, the branch taken in the backwards direction may be different from the one taken in the original forwards execution, a feature that is commonly supported by lens languages [7] which we call *branch switching*.

Branch switching is crucial to *put*'s *robustness*, i.e., the ability to handle a wide range of view updates (including those affect the branching decisions) without failing. We explain its working in details in the following.

**Branch Switching.** Being able to choose a different branch in the backwards direction only solves part of the problem. Let us consider the case where a forward execution chooses the nth branch, and the backwards execution, based on the updated view, chooses the <sup>m</sup>th (<sup>m</sup> <sup>=</sup> <sup>n</sup>) branch. In this case, the original value of the pattern-matched expression e, which is the reason for the nth branch being chosen, is not compatible with the *put* of the mth branch.

As an example, let us consider a simple function that pattern-matches on an *Either* structure and returns an list. Note that we have purposely omitted the reconciliation functions.

*f* :: **B**(Either [A] (A, [A])) → **B**[A] *f* x = **case** x **of** Left *ys* → *ys* **with** λ .True {- no **by** here -} Right (y, *ys*) → y : *ys* **with** *not* ◦ *null*

We have said that functions of type **B**A → **B**B are also fully functioning lenses of type *Lens* A B. In HOBiT, the above code runs as follows, where HOBiT> is the prompt of HOBiT's read-eval-print loop, and :get and :put are meta-language operations to perform *get* and *put* respectively.

```
HOBiT> :get f (Left [1, 2, 3])
[1, 2, 3]
HOBiT> :get f (Right (1, [2, 3]))
[1, 2, 3]
HOBiT> :put f (Left [1, 2, 3]) [4, 5] -- The view [1, 2, 3] is updated to [4, 5].
Left [4, 5] -- Both exit conditions are true with [4, 5],
                                 -- so the original branch (Left) is taken.
HOBiT> :put f (Right (1, [2, 3])) [4, 5]
Right (4, [5]) -- Similar, but the original branch is Right.
HOBiT> :put f (Right (1, [2, 3])) [ ]
⊥ -- Branch switches, but computation fails.
```
As we have explained above, exit conditions are used to decide which branch will be used in the backwards direction. For the first and second evaluations of *put*, the exit conditions corresponding to the original branches were true for the updated view. For the last evaluation of *put*, since the exit condition of

**Fig. 1.** Reconciliation function: assuming exit conditions φ*<sup>m</sup>* and φ*<sup>n</sup>* where φ*<sup>m</sup>* b*<sup>n</sup>* = False but φ*<sup>n</sup>* b*<sup>n</sup>* = True, and reconciliation functions ρ*<sup>m</sup>* and ρ*n*.

the original branch was false but that of the other branch was true, branch switching is required here. However, a direct *put*-execution of f with the inputs (Right (1, [2, 3])) and [ ] crashes (represented by ⊥ above), for a good reason, as the two inputs are in an inconsistent state with respect to f.

This is where reconciliation functions come into the picture. For the Left branch above, a sensible reconciliation function will be (λ .λ .Left [ ]), which when applied turns the conflicting source (Right (1, [2, 3])) into Left [ ], and consequently the *put*-execution may succeed with the new inputs and returns Left [ ]. It is not difficult to verify that the "reconciled" *put*-execution still satisfies well-behavedness. Note that despite the similarity in types, reconciliation functions are not *put*; they merely provide a default source value to allow stuck *put*-executions to proceed. We visualise the effect of reconciliation functions in Fig. 1. The left-hand side is bidirectional execution without successful branchswitching, and since φ<sup>m</sup> b<sup>n</sup> is false (indicating that b<sup>n</sup> is not in the range of the mth branch) the execution of *put* must (rightfully) fail in order to guarantee well-behavedness. On the right-hand side, reconciliation function ρ<sup>n</sup> produces a suitable source from *a<sup>m</sup>* and *b<sup>n</sup>* (where φ<sup>n</sup> (*get* (ρ<sup>n</sup> a<sup>m</sup> bn)) is True), and *put* executes with *b<sup>n</sup>* and the new source ρ*<sup>n</sup> a<sup>m</sup> bn*. It is worth mentioning that branch switching with reconciliation functions does not compromise correctness: though the quality of the user-defined reconciliation functions affects robustness as they may or may not be able to resolve conflicts, successful *put*-executions always guarantee well-behavedness, regardless the involvement of reconciliation functions.

**Revisiting** *appendB***.** Recall *appendB* from Sect. 1.1 (reproduced below).

$$\begin{array}{l} \mathit{append} B :: \mathsf{B}[A] \to \mathsf{B}[A] \to \mathsf{B}[A] \\ \mathit{append} B \ x \ y = \mathtt{case} \ x \ \mathtt{of} \ [] \end{array} \begin{array}{l} \mathit{while} \\ \mathit{a} : x' \rightarrow y \ \mathtt{while} \ \lambda \ldots \mathtt{True} \ \mathtt{while} \\ \mathit{a} : x' \rightarrow a \ \mathtt{append} B \ x' \ y \ \mathtt{with} \ \mathtt{not} \ \mathtt{while} \ \mathtt{by} \ (\lambda \ldots \lambda \ldots \lambda) \end{array}$$

The exit condition for the nil case always returns true as there is no restriction on the value of *y*, and for the cons case it requires the returned list to be nonempty. In the backwards direction, when the updated view is non-empty, both exit conditions will be true, and then the original branch will be taken. This means that since *appendB* is defined as a recursion on x, the backwards execution will try to unroll the original recursion step by step (i.e., the cons branch will be taken for a number of times that is the same as the length of *x* ) as long as the view remains non-empty. If an updated view list is shorter than *x* , then *not* ◦*null*

will become false before the unrolling finishes, and the nil branch will be taken (branch-switching) and the reconciliation function will be called.

The definition of *appendB* is curried; straightforward uncurrying turns it into the standard form **B**A → **B**B that can be interpreted by HOBiT as a lens. The following HOBiT program is the bidirectional variant of *uncurry*.

*uncurryB* :: (**B**A → **B**B → **B**C) → **B**(A, B) → **B**C *uncurryB* f z = **let** (x, y) = z **in** fxy

Here, **let** p = e **in** e is syntactic sugar for **case** e **of** {p → e **with** (λ .True) **by** (λs.λ .s)}, in which the reconciliation function is never called as there is only one branch. Let *appendB* = *uncurryB appendB*, then we can run *appendB* as:

HOBiT> :get *appendB*- ([1, 2], [3, 4, 5]) [1, 2, 3, 4, 5] HOBiT> :put *appendB*- ([1, 2], [3, 4, 5]) [6, 7, 8, 9, 10] ([6, 7], [8, 9, 10]) -- No structural change, no branch switching. HOBiT> :put *appendB*- ([1, 2], [3, 4, 5]) [6, 7] ([6, 7], []) -- No branch switching, still. HOBiT> :put *appendB*- ([1, 2], [3, 4, 5]) [6] ([6], []) -- Branch-switching happens and the recursion terminates early.

**Difference from Lens Combinators.** As mentioned above, the idea of branch switching can be traced back to lens languages. In particular, the design of **case** is inspired by the combinator *cond* [7]. Despite the similarities, it is important to recognise that **case** is not only a more convenient syntax for *cond*, but also crucially supports the unrestricted use of λ-bound variables. This more fundamental difference is the reason why we could define *appendB* in the conventional functional style as the variables *x* and *y* are used freely in the body of **case**. In other words, the novelty of HOBiT is its ability to combine the traditional (higher-order) functional programming and the bidirectional constructs as found in lens combinators, effectively establishing a new way of bidirectional programming.

### **2.2 A More Elaborate Example:** *linesB*

In addition to supporting convenient programming and robustness in *put* execution, the **case** constructs can also be used to express intricate details of backwards behaviours. Let us consider the *lines* function in Haskell as an example, which splits a string into a list of strings by newlines, for example, *lines* "AA\nBB\n" = ["AA", "BB"], except that the last newline character in its input is optional. For example, *lines* returns ["AA", "BB"] for both "AA\nBB\n" and "AA\nBB". Suppose that we want the backwards transformation of *lines* to exhibit a behaviour that depends on the original source:

**Fig. 2.** *linesB* and *breakNLB*

```
HOBiT> :put linesB "AA\nBB" ["a", "b"]
"a\nb"
HOBiT> :put linesB "AA\nBB" ["a", "b", "c"]
"a\nb\nc"
HOBiT> :put linesB "AA\nBB" ["a"]
"a"
HOBiT> :put linesB "AA\nBB\n" ["a", "b", "c"]
"a\nb\nc\n"
HOBiT> :put linesB "AA\nBB\n" ["a"]
"a\n"
```
This behaviour is achieved by the definition in Fig. 2, which makes good use of reconciliation functions. Note that we do not consider the contrived corner case where the string ends with duplicated newlines such as in "A\n\n". The function *breakNLB* splits a string at the first newline; since *breakNLB* is injective, its exit conditions and reconciliation functions are of little interest. The interesting part is in the definition of *linesB*, particularly its use of reconciliation functions to track the existence of a last newline character. We firstly explain the branching structure of the program. On the top level, when the first line is removed from the input, the remaining string b may contain more lines, or be the end (represented by either the empty list or the singleton list ['\n']). If the first branch is taken, the returned result will be a list of more than one element. In the second branch when it is the end of the text, b could contain a newline or simply be empty. We do not explicitly give patterns for the two cases as they have the same body f :[ ], but the reconciliation function distinguishes the two in order to preserve the original source structure in the backwards execution. Note that we intentionally use the same variable name b in the case analysis and the reconciliation function, to signify that the two represent the same source data. The use of argument b in the reconciliation functions serves the purpose of remembering the (non)existence of the last newline in the original source, which is then preserved in the new source.

#### **Fig. 3.** Syntax of HOBiT Core

It is worth noting that just like the other examples we have seen, this definition in HOBiT shares a similar structure with a definition of *lines* in Haskell.<sup>1</sup> The notable difference is that a Haskell definition is likely to have a different grouping of the three cases of *lines* into two branches, as there is no need to keep track of the last newline for backwards execution. Recall that reconciliation functions are called *after* branches are chosen by exit conditions; in the case of *linesB*, the reconciliation function is used to decide the reconciled value of b to be "\n" or "". This, however, means that we cannot separate the pattern <sup>b</sup> into two "\n" and "" with copying its branch body and exit condition, because then we lose a chance to choose a reconciled value of b based on its original value.

### **3 Syntax and Type System of HOBiT Core**

In this section, we describe the syntax and the type system of the core of HOBiT.

#### **3.1 Syntax**

The syntax of HOBiT Core is given in Fig. 3. For simplicity, we only consider booleans and lists. The syntax is almost the same as the standard λ-calculus with the fixed-point combinator (**fix**), lists and booleans. For data constructors and case expressions, there are in addition bidirectional versions that are underlined. We allow the body of **fix** to be non-λs to make our semantics simple (Sect. 4), though such a definition like **fix**(λx.True : x) can diverge.

Although in examples we used **case**/**case**-expressions with an arbitrary number of branches having overlapping patterns under the first-match principle, we assume for simplicity that in HOBiT Core **case**/**case**-expressions must have exactly two branches whose patterns do not overlap; extensions to support these features are straightforward. As in Haskell, we sometimes omit the braces and semicolons if they are clear from the layout.

<sup>1</sup> Haskell's *lines*'s behaviour is a bit more complicated as it returns [ ] if and only if the input is "". This behaviour can be achieved by calling *linesB* only when the input list is nonempty.

**Fig. 4.** Typing rules: Δ p : σ is similar to Γ p : A but asserts that the resulting environment is actually a bidirectional environment.

#### **3.2 Type System**

The types in HOBiT Core are defined as follows.

$$A, B \:: = \mathbf{B}\sigma \mid A \to B \mid [A] \mid Bool$$

We use the metavariable σ, τ, . . . for types that do not contain → nor **B**, We call σ-types *pure datatypes*, which are used for sources and views of lenses. Intuitively, **B**σ represents "updatable σ"—data subject to update in bidirectional transformation. We keep the type system of HOBiT Core simple, though it is possible to include polymorphic types or intersection types to unify unidirectional and bidirectional constructors.

The typing judgment Γ; Δ e : A, which reads that under environments Γ and Δ, expression e has type A, is defined by the typing rules in Fig. 4. We use two environments: Δ (the *bidirectional type environment*) is for variables introduced by pattern-matching through **case**, and Γ for everything else. It is interesting to observe that Δ only holds pure datatypes, as the pattern variables of **case** have pure datatypes, while Γ holds any types. We assume that the variables in Γ and those in Δ are disjoint, and appropriate α-renaming has been done to ensure this. This separation of Δ from Γ does not affect typeability, but is key to our semantics and correctness proof (Sect. 4). Most of the rules are standard except **case**; recall that we only use unidirectional constructors in patterns which have pure types, while the variables bound in the patterns are used as **B**-typed values in branch bodies.

### **4 Semantics of HOBiT Core**

Recall that the unique strength of HOBiT is its ability to mix higher-order unidirectional programming with bidirectional programming. A consequence of this mixture is that we can no longer specify its semantics in the same way as other *first-order* bidirectional languages such as [13], where two semantics—one for *get* and the other for *put*—suffice. This is because the category of lenses is believed to have no exponential objects [27] (and thus does not permit λs).

#### **4.1 Basic Idea: Staging**

Our solution to this problem is staging [5], which separates evaluation into two stages: the unidirectional parts is evaluated first to make way for a bidirectional semantics, which only has to deal with the residual first-order programs. As a simple example, consider the expression (λz.z) (x : ((λw.w) y) : [ ]). The first-stage evaluation, e ⇓<sup>U</sup> E, eliminates λs from the expression as in (λz.z) (x : ((λw.w) y) : [ ]) ⇓<sup>U</sup> x : y : [ ]. Then, our bidirectional semantics will be able to treat the residual expression as a lens between value environments and values, following [13,20]. Specifically, we have the *get* evaluation relation μ <sup>G</sup> E ⇒ v, which computes the value v of E under environment μ as usual, and the *put* evaluation relation μ <sup>P</sup> v ⇐ E μ , which computes an updated environment μ for E from the updated view v and the original environment μ. In pseudo syntax, it can be understood as *put* Eμv = μ , where μ represents the original source and μ the new source.

It is worth mentioning that a complete separation of the stages is not possible due to the combination of **fix** and **case**, as an attempt to fully evaluate them in the first stage will result in divergence. Thus, we delay the unidirectional evaluation inside **case** to allow **fix**, and consequently the three evaluation relations (uni-directional, *get*, and *put*) are mutually dependent.

#### **4.2 Three Evaluation Relations: Unidirectional,** *get* **and** *put*

First, we formally define the set of residual expressions:

E ::= True | False | [ ] | E<sup>1</sup> : E<sup>2</sup> | λx.e | x | True | False | [ ] | E<sup>1</sup> : E<sup>2</sup> | **case** E<sup>0</sup> **of** {p<sup>i</sup> → e<sup>i</sup> **with** E<sup>i</sup> **by** E <sup>i</sup>}i=1,<sup>2</sup>

They are treated as values in the unidirectional evaluation, and as expressions in the *get* and *put* evaluations. Notice that e or e<sup>i</sup> appear under λ or **case**, meaning that their evaluations are delayed.

The set of *(first-order) values* is defined as below.

$$
v ::= \mathsf{True} \mid \mathsf{False} \mid [] \mid v\_1 : v\_2
$$

Accordingly, we define a *(first-order) value environment* μ as a finite mapping from variables to first-order values.


**Fig. 5.** Evaluation rules for unidirectional parts (excerpt)

**Unidirectional Evaluation Relation.** The rules for the unidirectional evaluation relation is rather standard, as excerpted in Fig. 5. The bidirectional constructs (i.e., bidirectional constructors and **case**) are frozen, i.e., behave just like ordinary constructors in this evaluation. Notice that we can evaluate an expression containing free variables; then the resulting residual expression may contain the free variables.

**Bidirectional** (*get* **and** *put*) **Evaluation Relations.** The *get* and *put* evaluation relations, μ <sup>G</sup> E ⇒ v and μ <sup>P</sup> v ⇐ E μ , are defined so that they together form a lens.

*Weakening of Environment.* Before we lay out the semantics, it is worth explaining a subtlety in environment handling. In conventional evaluation semantics, a larger than necessary environment does no harm, as long as there is no name clashes. For example, whether the expression x is evaluated under the environment {x = 1} or {x = 1, y = 2} does not matter. However, the same is not true for bidirectional evaluation. Let us consider a residual expression E = x : y : [ ], and a value environment μ = {x = 1, y = 2} as the original source. We expect to have μ <sup>G</sup> E ⇒ 1 : 2 : [ ], which may be derived as:

$$
\begin{array}{ccc}
\hline
\mu \vdash\_{\mathcal{G}} x \Rightarrow 1 & \stackrel{\scriptstyle \scriptstyle}{\mu} \vdash\_{\mathcal{G}} y \mathrel{\scriptstyle \left[ \begin{array}{c} \Box \\ \hline \mu \vdash\_{\mathcal{G}} x \mathrel{\scriptstyle \left[ \begin{array}{c} \Box \end{array} \Rightarrow 2 : [] \end{array} \right] }} & \stackrel{\scriptstyle \scriptstyle}{\mu} \vdash\_{\mathcal{G}} \begin{array}{c} \hline \hline \hline \end{array}}{\begin{array}{c} \hline \hline \mu \vdash\_{\mathcal{G}} x \mathrel{\scriptstyle \left[ \begin{array}{c} \Box \end{array} \Rightarrow 1 : 2 : [] \end{array} \end{array}} \\
\hline
\end{array}
$$

In the *put* direction, for an updated view say 3 : 4 : [ ], we expect to have μ <sup>P</sup> 3: 4:[] ⇐ E {x = 3, y = 4} with the corresponding derivation:

.

$$\begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \\ \mu \vdash\_{\text{P}} 3 \Leftarrow \text{\$x} \mathrel{\mathrel{\begin{array}{l}} ?} \text{?} \\ \end{array} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \begin{array}{l} \vdots\\ \mu \vdash\_{\text{P}} 4 \mathrel{\begin{array}{l} [] \end{array} \end{array} \end{array} \begin{array}{l} \vdots\\ \mu \vdash\_{\text{P}} 4 \mathrel{\begin{array}{l} [] \end{array} \end{array} \begin{array}{l} \begin{array}{l} \\ \end{array} \end{array} \end{array} \end{array} \end{array}$$

What shall the environments ?<sup>1</sup> and ?<sup>2</sup> be? One way is to have μ <sup>P</sup> 3 ⇐ x {x = 3, y = 2}, and μ <sup>P</sup> 4 : [] ⇐ y : [ ] {x = 1, y = 4}, where the variables do not appear free in the residual expression takes their values from the original source environment μ. However, the evaluation will get stuck here, as there is no reasonable way to produce the expected result {x = 3, y = 4} from ?<sup>1</sup> = {x = 3, y = 2} and ?<sup>2</sup> = {x = 1, y = 4}. In other words, the redundancy in environment is harmful as it may cause conflicts downstream.

Our solution to this problem, which follows from [21–23,29], is to allow *put* to return value environments containing only bindings that are relevant for the residual expressions under evaluation. For example, we have μ <sup>P</sup> 3 ⇐ x {x = 3}, and μ <sup>P</sup> 4 : [] ⇐ y : [ ] {y = 4}. Then, we can merge the two value environments ?<sup>1</sup> = {x = 3} and ?<sup>2</sup> = {y = 4} to obtain the expected result {x = 3, y = 4}. As a remark, this seemingly simple solution actually has a nontrivial effect on the reasoning of well-behavedness. We defer a detailed discussion on this to Sect. 4.3.

Now we are ready to define *get* and *put* evaluation rules for each bidirectional constructs. For variables, we just lookup or update environments. Recall that μ is a mapping (i.e., function) from variables to (first-order) values, while we use a record-like notation such as {x = v}.

$$\overline{\mu \vdash\_{\mathcal{G}} x \Rightarrow \mu(x)} \qquad \overline{\mu \vdash\_{\mathcal{P}} v \Leftarrow x \vdash \{x = v\}}$$

For constants c where c = False,True, [ ], the evaluation rules are straightforward.

$$
\overline{\mu \vdash\_{\mathbf{G}} \underline{c} \Rightarrow c} \qquad \overline{\mu \vdash\_{\mathbf{P}} c \Leftarrow\_{\mathbf{G}} \neg \emptyset}
$$

The above-mentioned behaviour of the bidirectional cons expression E<sup>1</sup> : E<sup>2</sup> is formally given as:

$$\begin{array}{c} \mu \vdash\_{\mathsf{G}} E\_{1} \Rightarrow v\_{1} \quad \mu \vdash\_{\mathsf{G}} E\_{2} \Rightarrow v\_{2} \\\hline \mu \vdash\_{\mathsf{G}} E\_{1} \; \mathtt{i} \; E\_{2} \Rightarrow v\_{1} : v\_{2} \end{array} \quad \begin{array}{c} \mu \vdash\_{\mathsf{P}} v\_{1} \Leftarrow \to E\_{1} \; \neg \mu'\_{1} \quad \mu \vdash\_{\mathsf{P}} v\_{2} \Leftarrow \to E\_{2} \; \neg \mu'\_{2} \\\hline \mu \vdash\_{\mathsf{P}} v\_{1} : v\_{2} \Leftarrow \to E\_{1} \; \neg E\_{2} \; \neg \mu'\_{1} \land \mu'\_{2} \end{array}$$

(Note that the variable rules guarantee that only free variables in the residual expressions end up in the resulting environments.) Here, is the merging operator defined as: μ μ = μ ∪ μ if there is no x such that μ(x) = μ (x). For example, {x = 3} - {y = 4} = {x = 3, y = 4}, and {x = 3, y = 4} - {y = 4} = {x = 3, y = 4}, but {x = 3, y = 2} -{y = 4} is undefined.

The most interesting rules are for **case**. In the *get* direction, it is not different from the ordinary **case** except that exit conditions are asserted, as shown in Fig. 6. We use the following predicate for pattern matching.

$$\mathsf{match}(p\_k, v\_0, \mu\_k) \;= \; (p\_k \mu\_k = v\_0) \land (\mathsf{dom}(\mu\_k) = \mathsf{fv}(p\_k))$$

Here, we abuse the notation to write pkμ<sup>k</sup> for the value obtained from p<sup>k</sup> by replacing the free variables x in p<sup>k</sup> with μk(x). One might notice that we have the disjoint union μμ<sup>i</sup> in Fig. 6 where μ<sup>i</sup> holds the values of the variables in pi, as we assume α-renaming of bound variables that is consistent in *get* and *put*. Recall that p<sup>1</sup> and p<sup>2</sup> are assumed not to overlap, and hence the evaluation is deterministic. Note that the reconciliation functions E <sup>i</sup> are untouched by the rule.

The *put* evaluation rule of **case** shown in Fig. 6 is more involved. In addition to checking which branch should be chosen by using exit conditions, we need two rules to handle the cases with and without branch switching. Basically,

$$\frac{\mu \vdash\_{\mathsf{G}} E\_{0} \Rightarrow v\_{0} \quad \mathsf{match}(p\_{i}, v\_{0}, \mu\_{i}) \quad e\_{i} \Downarrow\_{\mathsf{U}} E\_{i} \quad \mu \leftrightarrow \mu\_{i} \vdash\_{\mathsf{G}} E\_{i} \Rightarrow v \quad E\_{i}' \; v \Downarrow\_{\mathsf{U}} \mathsf{True}}{\mu \vdash\_{\mathsf{G}} \underline{\mathtt{cases}} \; E\_{0} \; \mathtt{of} \; \left\{ p\_{i} \to e\_{i} \; \underline{\mathtt{with}} \; E\_{i}' \; \underline{\mathtt{by}} \; E\_{i}' \right\}\_{i=1,2} \Rightarrow v}$$

$$\begin{array}{ccl} \mu \vdash\_{\mathsf{G}} E\_{\mathsf{0}} \Rightarrow v\_{0} & \mathsf{match}(p\_{i}, v\_{0}, \mu\_{i}) & E\_{i}^{\prime} \upsilon \Downarrow \mathsf{false} & j = 3 - i & E\_{j}^{\prime} \upsilon \Downarrow \mathsf{true} & e\_{j} \Downarrow \mathsf{U}\_{\mathsf{U}} \; E\_{j} \\ & & E\_{j}^{\prime\prime} v\_{0} \upsilon \Downarrow \mathsf{U}\_{\mathsf{U}} \; u\_{0} & \mathsf{match}(p\_{j}, u\_{0}, \mu\_{j}) & \\ \mu \Downarrow \mu\_{j} \vdash\_{\mathsf{P}} v \Leftarrow \mathsf{E}\_{j} \rightarrow E\_{j} \ \vdash \mu\_{\mathsf{Hom}(\mu), \mathsf{dom}(\mu\_{j})}^{\prime} \mu\_{j}^{\prime} & v\_{0}^{\prime} = p\_{j}(\mu\_{j}^{\prime} \neq \mu\_{j}) & \mu \vdash\_{\mathsf{P}} v\_{0}^{\prime} \Leftarrow E\_{0} \rightarrow \mu\_{0}^{\prime} \\ \hline & \mu \vdash\_{\mathsf{P}} v \Leftarrow \mathsf{case} \; E\_{0} \ \mbox{ of} \; \left\{ p\_{i} \to e\_{i} \ \mathtt{with} \; E\_{i}^{\prime} \ \mbox{by} \; E\_{i}^{\prime\prime} \right\}\_{i=1,2} \neg \mu\_{0}^{\prime} \lor \mu^{\prime} \end{array}$$

**Fig. 6.** *get*- and *put*-Evaluation of **case**: we write μ*X,Y* μ to ensure that dom(μ) ⊆ X and dom(μ- ) ⊆ Y .

the branch to be taken in the backwards direction is decided first, by the *get*evaluation of the case condition E<sup>0</sup> and the checking of the exit condition E i against the updated view v. After that, the body of the chosen branch e<sup>i</sup> is firstly uni-directionally evaluated, and then its residual expression E<sup>i</sup> is *put*-evaluated. The last step is *put*-evaluation of the case-condition E0. When branch switching happens, there is the additional step of applying the reconciliation function E j .

Note the use of operator in computing the updated case condition v 0.

$$(\mu' \lhd \mu)(x) = \begin{cases} \mu'(x) & \text{if } x \in \mathsf{dom}(\mu')\\ \mu(x) & \text{otherwise} \end{cases}$$

Recall that in the beginning of this subsection, we discussed our approach of avoiding conflicts by producing environments with only relevant variables. This means the μ <sup>i</sup> above contains only variables that appear free in Ei, which may or may not be all the variables in pi. Since this is the point where these variables are introduced, we need to supplement μ <sup>i</sup> with μ<sup>i</sup> from the original pattern matching so that p<sup>i</sup> can be properly instantiated.

**Construction of Lens.** Let us write L0[[E]] for a lens between value environments and values, defined as:

$$\begin{array}{lll} \text{get } \mathcal{L}\_0[E] \downarrow \mu = v & \text{if } \mu \vdash\_{\mathcal{G}} E \Rightarrow v \\\text{put } \mathcal{L}\_0[E] \downarrow \mu \, v = \mu' & \text{if } \mu \vdash\_{\mathcal{P}} v \Leftarrow E \dashv \mu' \end{array}$$

Then, we can define the lens Le induced from e (a closed function expression), where e x ⇓<sup>U</sup> E for some fresh variable x.

$$\begin{array}{l} \text{get } \mathcal{L}[e] \text{ } s = \text{get } \mathcal{L}\_0[E] \text{ } \{x = s\} \\ \text{put } \mathcal{L}[e] \text{ } s \text{ } v = (\mu' \lhd \{x = s\})(x) \quad \text{where } \mu' = \text{put } \mathcal{L}\_0[E] \text{ } \{x = s\} \text{ } v \end{array}$$

Actually, :get and :put in Sect. <sup>2</sup> are realised by *get* <sup>L</sup>e and *put* Le.

### **4.3 Correctness**

We establish the correctness of HOBiT Core: Le ∈ *Lens* σ τ is well-behaved for closed e of type **B**σ → **B**τ . Recall that *Lens* S V is a set of lenses -, where *get* - ∈ S → V and *put* - ∈ S → V → S. We only provide proof sketches in this subsection due to space limitation.

*-***-well-behavedness.** Recall that in the previous subsection, we allow environments to be weakened during *put*-evaluation. Since not all variables in a source may appear in the view, during some intermediate evaluation steps (for example within **case**-branches) the weakened environment may not be sufficient to fully construct a new source. Recall that, in μ <sup>P</sup> v ⇐ e μ , dom(μ ) can be smaller than dom(μ), a gap that is fixed at a later stage of evaluation by merging (-) and defaulting () with other environments. This technique reduces conflicts, but at the same time complicates the compositional reasoning of correctness. Specifically, due to the potentially missing information in the intermediate environments, well-behavedness may be temporally broken during evaluation. Instead, we use a variant of well-behavedness that is weakening aware, which will then be used to establish the standard well-behavedness for the final result.

**Definition 1 (-well-behavedness).** Let (S, ) and (V, ) be partiallyordered sets. A lens -∈ *Lens* S V is called *-well-behaved* if it satisfies

$$\begin{aligned} \text{get } \ell \text{ } s = v &\implies v \text{ is maximal} \land (\forall v'. v' \preceq v \implies \text{put } \ell \text{ } s \text{ } v' \preceq s) \\ (\preceq \text{-} \mathbf{Acceptability}) \\ \text{put } \ell \text{ } s \text{ } v = s' &\implies (\forall s''. s' \preceq s'' \implies v \preceq \text{get } \ell \text{ } s'') \end{aligned} \qquad (\preceq \text{-} \mathbf{Consistency})$$

for any s, s ∈ S and v ∈ V , where s is maximal.

We write *Lens*wb S V for the set of lenses in *Lens* S V that are -wellbehaved. In this section, we only consider the case where S and V are value environments and first-order values, where value environments are ordered by weakening (μ μ if μ(x) = μ (x) for all x ∈ dom(μ)), and () = (=) for first-order values. In Sect. 5.2 we consider a slightly more general situation.

The -well-behavedness is a generalisation of the ordinary well-behavedness, as it coincides with the ordinary well-behavedness when () = (=).

**Theorem 1.** *For* S *and* V *with* () = (=)*, a lens* - ∈ *Lens* S V *is -wellbehaved iff it is well-behaved.*

**Kripke Logical Relation.** The key step to prove the correctness of HOBiT Core is to prove that L0[[E]] is always -well-behaved if E is an evaluation result of a well-typed expression e. The basic idea is to prove this by logical relation that expression e of type **B**σ under the context Δ is evaluated to E, assuming termination, such that L0[[E]] is a -well-behaved lens between [[Δ]] and [[σ]].

Usually a logical relation is defined only by induction on the type. In our case, as we need to consider Δ in the interpretation of **B**σ, the relation should be indexed by Δ too. However, naive indexing does not work due to substitutions. For example, we could define a (unary) relation EΔ(**B**σ) as a set of expressions that evaluate to "good" (i.e., -well-behaved) lenses between (the semantics of) Δ and σ, and EΔ(**B**σ → **B**τ ) as a set of expressions that evaluate to "good" functions that map good lenses between Δ and σ to those between Δ and τ . This naive relation, however, does not respect substitution, which can substitute a value obtained from an expression typed under Δ to a variable typed under Δ such that Δ ⊆ Δ , where Δ and Δ need not be the same. With the naive definition, good functions at Δ need not be good functions at Δ , as a good lens between Δ and σ is not always a good lens between Δ and σ.

To remedy the situation, inspired by the denotation semantics in [24], we use Kripke logical relations [18] where worlds are Δs.

**Definition 2.** We define the set EΔ-A of expressions, the set RΔ-A of residual expressions, the set σ of values and the set -Δ of value environments as below.

$$\begin{aligned} \mathcal{E}\_{\Delta}[A] &= \{e \mid \forall E. \, e \, \mathbb{U}\_{\mathbb{U}} \, E \, \text{implies } E \in \mathcal{R}\_{\Delta}[A] \} \\ \mathcal{R}\_{\Delta}[Bool] &= \{\mathsf{True}, \mathsf{False} \} \\ \mathcal{R}\_{\Delta}[[A]] &= List \, \mathcal{R}\_{\Delta}[A] \\ \mathcal{R}\_{\Delta}[\mathsf{B}\sigma] &= \{E \mid \forall \Delta'. \, \Delta \subseteq \Delta' \text{ implies } \mathcal{L}\_{\mathbb{U}}[E] \in \mathsf{Lens}^{\prec \mathsf{w} \mathsf{b}} \, [\Delta'] \, [\![\sigma] \} \\ \mathcal{R}\_{\Delta}[A \to B] &= \{F \mid \forall \Delta'. \, \Delta \subseteq \Delta' \text{ implies } (\forall E \in \mathcal{R}\_{\Delta'}[A] \, \text{.} \, F \, E \in \mathcal{E}\_{\Delta'}[B] \} \} \\ \{Bool\} &= \{\mathsf{True}, \mathsf{False} \} \\ \{ [\sigma] \} &= List \, [\sigma] \\ \{\Delta\} &= \{\mu \mid \mathsf{dom}(\mu) \subseteq \mathsf{dom}(\Delta) \text{ and } \forall x \in \mathsf{dom}(\mu). \mu(x) \in \left[\varDelta(x)\right] \} \end{aligned}$$

Here, for a set S, *List* S is inductively defined as: [ ] ∈ *List* S, and s : t ∈ *List* S for all s ∈ S and t ∈ *List* S.

The notable difference from ordinary logical relations is the definition of RΔ-A → B where we consider an arbitrary Δ such that Δ ⊆ Δ . This is the key to state RΔ-A ⊆ R<sup>Δ</sup>- -A if Δ ⊆ Δ . Notice that σ = RΔσ for any Δ. We have the following lemmas.

**Lemma 1.** *If* Δ ⊆ Δ *,* v ∈ RΔ-A *implies* v ∈ R<sup>Δ</sup>- -A*.*

$$\text{Lemma 2. } x \in \mathcal{R}\_{\Delta}[\mathbf{B}\sigma] \text{ for any } \Delta \text{ such that } \Delta(x) = \sigma. \tag{7.1}$$

**Lemma 3.** *For any* σ *and* Δ*,* True, False ∈ RΔ-**B***Bool and* [ ] ∈ RΔ-**B**[σ]*.*

**Lemma 4.** *If* E<sup>1</sup> ∈ RΔ-**B**σ *and* E<sup>2</sup> ∈ RΔ-**B**[σ]*, then* E<sup>1</sup> : E<sup>2</sup> ∈ RΔ-**B**[σ]*.*

**Lemma 5.** *Let* σ *and* τ *be pure types and* Δ *a pure type environment. Suppose that* e<sup>i</sup> ∈ E<sup>Δ</sup>Δ*<sup>i</sup>* τ *for* Δ<sup>i</sup> p<sup>i</sup> : σ *(*i = 1, 2*), and that* E<sup>0</sup> ∈ RΔ-**B**σ*,* E 1, E <sup>2</sup> ∈ RΔτ → *Bool and* E <sup>1</sup> , E <sup>2</sup> ∈ RΔσ → τ → σ*. Then,* **case** E<sup>0</sup> **of** {p<sup>i</sup> → e<sup>i</sup> **with** E <sup>i</sup> **by** E <sup>i</sup> }i=1,<sup>2</sup> ∈ RΔ-**B**τ *.*

*Proof (Sketch).* The proof itself is straightforward by case analysis. The key property is that *get* and *put* use the same branches in both proofs of -**Acceptability** and -**Consistency**. Slight care is required for unidirectional evaluations of e<sup>1</sup> and e2, and applications of E 1, E 2, E <sup>1</sup> and E <sup>2</sup> . However, the semantics is carefully designed so that in the proof of -**Acceptability**, unidirectional evaluations that happen in *put* have already happened in the evaluation of *get*, and a similar discussion applies to -**Consistency**.

As a remark, recall that we assumed α-renaming of p<sup>i</sup> so that the disjoint unions () in Fig. 6 succeed. This renaming depends on the μs received in *get* and *put* evaluations, and can be realised by using de Bruijn levels.

**Lemma 6 (Fundamental Lemma).** *For* Γ; Δ e : A*, for any* Δ *with* Δ ⊆ Δ *and* E<sup>x</sup> ∈ R<sup>Δ</sup>- -Γ(x)*, we have* e[Ex/x]<sup>x</sup> ∈ E<sup>Δ</sup>- -A*.*

*Proof (Sketch).* We prove the lemma by induction on typing derivation. For bidirectional constructs, we just apply the above lemmas appropriately. The other parts are rather routine.

Now we are ready to state the correctness of our construction of lenses.

$$\text{Corollary 1.}\quad If \,\varepsilon; \varepsilon \vdash e: \mathbf{B}\sigma \to \mathbf{B}\tau, \, then \, e \, x \in \mathcal{E}\_{\{x:\sigma\}}[\mathbf{B}\tau]. \tag{7}$$

**Lemma 7.** *If* e ∈ E{x:σ}-**B**τ *,* Le *(if defined) is in Lens*wb σ τ *(and thus well-behaved by Theorem 1).*

**Theorem 2.** *If* ε; ε e : **B**σ → **B**τ *, then* Le ∈ *Lens* σ τ *(if defined) is wellbehaved.*

### **5 Extensions**

Before presenting a larger example, we discuss a few extensions of HOBiT Core which facilitate programming.

### **5.1 In-Language Lens Definition**

In HOBiT programming, it is still sometimes useful to allow manually defined primitive lenses (i.e., lenses constructed from independently specified *get*/*put* functions), for backwards compatibility and also for programs with relatively simple computation logic but complicated backwards behaviours. This feature is supported by the construct **appLens** e<sup>1</sup> e<sup>2</sup> e<sup>3</sup> in HOBiT. For example, we can write *incB* x = **appLens** (λs.s + 1) (λ .λv.v − 1) x to define a bidirectional increment function *incB* :: **B***Int* → **B***Int*. Note that for simplicity we require the additional expression x (represented by e<sup>3</sup> in the general case) to convert between normal functions and lenses. The typing rule for **appLens** e<sup>1</sup> e<sup>2</sup> e<sup>3</sup> is as below.

$$\begin{array}{c} \Gamma; \Delta \vdash e\_1 : \sigma \to \tau \quad \Gamma; \Delta \vdash e\_2 : \sigma \to \tau \to \sigma \quad \Gamma; \Delta \vdash e\_3 : \mathbf{B}\sigma\\ \hline \hline \end{array}$$

Accordingly, we add the following unidirectional evaluation rule.

$$\begin{array}{c} \begin{array}{c} e\_i \ \Downarrow\_{\text{U}} \ E\_i \quad (i = 1, 2, 3) \\ \hline \underline{\mathtt{appLens}} \ e\_1 \ e\_2 \ e\_3 \Downarrow\_{\text{U}} \underline{\mathtt{appLens}} \ E\_1 \ E\_2 \ E\_3 \end{array} \\ \end{array}$$

Also, we add the following *get*/*put* evaluation rules for **appLens**.

$$\frac{\mu \vdash\_{\mathsf{G}} E\_3 \Rightarrow v \quad E\_1 \ v \Downarrow\_{\mathsf{U}} u}{\mu \vdash\_{\mathsf{G}} \underline{\mathtt{appLens}} \; E\_1 \; E\_2 \; E\_3 \Rightarrow u} \quad \frac{\mu \vdash\_{\mathsf{G}} E\_3 \Rightarrow v \quad E\_2 \ v \; u' \Downarrow\_{\mathsf{U}} v' \quad \mu \vdash\_{\mathsf{P}} v' \Leftarrow \mid E\_3 \rightarrow \mu'}{\mu \vdash\_{\mathsf{P}} u' \Leftarrow \underline{\mathtt{appLens}} \; E\_1 \; E\_2 \; E\_3 \vdash \mu'}$$

Notice that **appLens** e<sup>1</sup> e<sup>2</sup> e<sup>3</sup> is "good" if e<sup>3</sup> is so, i.e., **appLens** e<sup>1</sup> e<sup>2</sup> e<sup>3</sup> ∈ EΔ-**B**τ if e<sup>3</sup> ∈ EΔ-**B**σ, provided that the *get*/*put* pair (e1, e2) is well-behaved.

#### **5.2 Lens Combinators as Language Constructs**

In this paper, we have focused on the **case** construct, which is inspired by the *cond* combinator [7]. Although *cond* is certainly an important lens combinator, it is not the only one worth considering. Actually, we can obtain language constructs from a number of lens combinators including those that take care of alignment [2]. For the sake of demonstration, we outline the derivation of a simpler example *comb* ∈ *Lens* σ τ → *Lens* σ τ . As the construction depends solely on types, we purposely leave the combinator abstract.

A naive way of lifting combinators can already be found in [21,23]. For example, for *comb*, we might prepare the construct **comb**bad with the following typing rule (where ε is the empty environment):

$$\frac{\varepsilon; \varepsilon \vdash e : \mathbf{B}\sigma \to \mathbf{B}\tau \quad \Gamma; \Delta \vdash e' : \mathbf{B}\tau'}{\Gamma; \Delta \vdash \underline{\mathbf{comb}}\_{\text{bad}} \; e \; e' : \mathbf{B}\tau'}$$

Notice that in this version e is required to be closed so that we can turn the function directly into a lens by L-−, and the evaluation of **comb**bad can then be based on standard lens composition: L0**comb**bad E E = *comb* L-E ˆ◦ L0-E (we omit the straightforward concrete evaluation rules), where E and E is the unidirectional evaluation results of e and e (notice that a residual expression is also an expression), and ˆ◦ is the lens composition combinator [7] defined by:

$$\begin{array}{l} (\hat{\circ}) \in Lens \ B \ C \to Lens \ A \ B \to Lens \ A \ C \\ \text{get } (\ell\_2 \circ \ell\_1) \ a \ = \text{get } \ell\_2 \ (\text{get } \ell\_1 \ a) \\ \text{put } (\ell\_2 \circ \ell\_1) \ a \ c' = \text{put } \ell\_1 \ a \ (\text{put } \ell\_2 \ (\text{get } \ell\_1 \ a) \ c') \end{array}$$

The combinator preserves -well-behavedness, and thus **comb**bad guarantees correctness. However, as discussed extensively in the case of **case**, this "closedness" requirements prevents flexible use of variables and creates a major obstacle in programming.

So instead of the plain *comb*, we shall assume a parameterised version *pcomb* ∈ *Lens* (T × σ) τ → *Lens* (T × σ ) τ that allows each source to have an extra component T, which is expected to be kept track of by the combinator without modification. Here T is assumed to have a partial merging operator (-) ∈ T → T → T and a minimum element, and *pcomb* may use these facts in its definition. By using *pcomb*, we can give a corresponding language construct **comb** with a binder, typed as follows.

$$\frac{\Gamma; \Delta, x: \sigma \vdash e: \mathbf{B}\tau \quad \Gamma; \Delta \vdash e': \mathbf{B}\sigma'}{\Gamma; \Delta \vdash \underline{\mathbf{comb}} \; (x.e) \; e': \mathbf{B}\tau'}$$

We give its unidirectional evaluation rule as

$$\frac{e\Downarrow\Downarrow E\quad e'\Downarrow\Downarrow E'}{\underline{\mathtt{comb}}\ (x.e)\;e'\Downarrow\Downarrow\underline{\mathtt{U}\;\underline{\mathtt{comb}}\;E\;E'}}$$

We omit the *get*/*put* evaluation rules, which are straightforwardly obtained from the following equation.

$$\mathcal{L}\_0[\underline{\mathbf{comb}}\,E\,E'] = pcomb\,\left(unEnv\_x\,\mathcal{L}\_0[E]\right)\,\left\langle idL,\mathcal{L}\_0[E']\right\rangle$$

where *unEnv*<sup>x</sup> ∈ *Lens* (-Δ {x : σ}) τ → *Lens* (-Δ × σ) τ and −, − ∈ *Lens* -Δ A → *Lens* -Δ B → *Lens* -Δ (A × B) are lens combinators defined for any Δ as:

$$\begin{array}{lcl} \text{get } (unEnv\_x \,\ell) \,(\mu, v) &= \text{get } \ell \,(\mu \,\upleftarrow \{x = v\})\\ \text{put } (unEnv\_x \,\ell) \,(\mu, v) \,u &= (\mu', v')\\ \text{where } \mu' \,\uplus \{x = v'\} &= \text{(put } \ell \,(\mu \,\upleftarrow \{x = v\}) \,v) \,\lhd \{x = v\} \\\\ \text{get } \langle \ell\_1, \ell\_2 \rangle \,\mu &= \text{(get } \ell\_1 \,\mu, \text{get } \ell\_2 \,\mu) \\\\ \text{put } \langle \ell\_1, \ell\_2 \rangle \,\mu \,(a, b) &= \text{put } \ell\_1 \,\mu \,a \,\lhd \,\ell\_2 \,\mu \,b \end{array}$$

Both combinators preserve -well-behavedness, where we assume the component-wise ordering on pairs. No "closedness" requirement is imposed on e in this version. From the construct, we can construct a higher-order function λf.λz.**comb** (x.f x) z : (**B**σ → **B**τ ) → **B**σ → **B**τ . That is, in HOBiT, lens combinators are just higher-order functions, as long as they permit the abovementioned parameterisation. This observation means that we are able to systematically derive language constructs from lens combinators; as a matter of fact, the semantics of **case** is derived from a variant of the *cond* combinator [7].

Even better, the parametrised *pcomb* can be systematically constructed from the definition of *comb*. For *comb*, it is typical that *get* (*comb* -) only uses *get* -, and *put* (*comb* -) uses *put* -; that is, *comb* essentially consists of two functions of types (σ → τ ) → (σ → τ ) and (σ → τ → σ) → (σ → τ → σ ). Then, we can obtain *pcomb* of the above type merely by "monad"ifying the two functions: using the reader monad T → − for the former and the composition of the reader and writer monads T → (−, T) backwards for the latter suffice to construct *pcomb*.

A remaining issue is to ensure that *pcomb* preserves -well-behavedness, which ensures **comb** (x.e) e ∈ EΔ-**B**τ under the assumptions e ∈ EΔ{x:σ}-**B**τ and e ∈ EΔ-**B**σ . Currently, such a proof has to be done manually, even though *comb* preserves well-behavedness and *pcomb* is systematically constructed. Whether we can lift the correctness proof for *comb* to *pcomb* in a systematic way will be an interesting future exploration.

#### **5.3 Guards**

Guards used for branching are merely syntactic sugar in ordinary unidirectional languages such as Haskell. But interestingly, they actually increase the expressive power of HOBiT, by enabling inspection of updatable values without making the inspection functions bidirectional.

For example, Gl¨uck and Kawabe's reversible equivalence check [10] can be implemented in HOBiT as follows.

$$\begin{array}{l} \mathit{eqCheck} :: \textbf{B}\sigma \rightarrow \textbf{B}\sigma \rightarrow \textbf{B}(Either\ (\sigma,\sigma)\ \sigma) \\\mathit{eqCheck} \ x\ y = \underline{\mathtt{case}}\ \underline{(x,y)}\ \underline{\mathtt{of}} \\\ (x',y')\ |\ x' = \underline{y'} \rightarrow \underline{\mathtt{Right}}\ x'\ \underline{\mathtt{with}}\ is\mathit{Right}\ \mathtt{by}\ (\lambda\\_\omega(\mathtt{Right}\ x).(x,x)) \\\ (x',y')\ |\ otherwise\end{array}$$

Here, (−, −) is the bidirectional version of the pair constructor. The exit condition *isRight* checks whether a value is headed by the constructor Right, and *isLeft* by Left. Notice that the backwards transformation of *eqCheck* fails when the updated view is Left (v, v) for some v.

#### **5.4 Syntax Sugar for Reconciliation Functions**

In the general form, reconciliation functions take in two arguments for the computation of the new source. But as we have seen, very often the arguments are not used in the definition and therefore redundant. This observation motivates the following syntax sugar.

$$p \to e \underbrace{\text{with}}\_{} e' \underbrace{\text{default}}\_{} \{x\_1 = e\_1''; \dots; x\_n = e\_n''\}$$

Here, x1,...,x<sup>n</sup> are the free variables in p. This syntax sugar is translated as:

$$p \to e \underbrace{\mathbf{with}}\_{} e' \underbrace{\mathbf{b} \mathbf{y}}\_{} \lambda \dots \lambda \dots p[e\_1''/x\_1, \dots, e\_n''/x\_n]$$

Furthermore, it is also possible to automatically derive some default values from their types. This idea can be effectively implemented if we extend HOBiT with type classes.

#### **5.5 Inference of Exit Conditions**

It is possible to infer exit conditions from their surrounding contexts; an idea that has been studied in the literature of invertible programming [11,20], and may benefit from range analysis.

Our prototype implementation adopts a very simple inference that constructs an exit condition λx.**case** x **of** {p<sup>e</sup> → True; → False} for each branch, where p<sup>e</sup> is the skeleton of the branch body e, constructed by replacing bidirectional constructors with the unidirectional counterparts, and non-constructor expressions with . For example, from a : *appendB* x y, we obtain the pattern : . This embarrassingly simple inference has proven to be handy for developing larger HOBiT programs as we will see in Sect. 6.

### **6 An Involved Example: Desugaring**

In this section, we demonstrate the programmability of HOBiT using the example of bidirectional desugaring [26]. Desugaring is a standard process for most programming languages, and making it bidirectional allows information in desugared form to be propagated back to the surface programs. It is argued convincingly in [26] that such bidirectional propagation (coined *resugaring*) is effective in mapping reduction sequences of desugared programs into those of the surface programs.

Let us consider a small programming language that consists of **let**, **if**, Boolean constants, and predefined operators.

**data** E = ELet E E | EVar *Int* | EIf EEE | ETrue | EFalse | EOp *Name* [E] **type** *Name* = *String*

Variables are represented as de Bruijn indices.

Some operators in this language are syntactic sugar. For example, we may want to desugar

EOp "not" [e] as EIf <sup>e</sup> EFalse ETrue.

Also, <sup>e</sup><sup>1</sup> || <sup>e</sup><sup>2</sup> can be transformed to **let** <sup>x</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> **in if** <sup>x</sup> **then** <sup>x</sup> **else** <sup>e</sup>2, which in our mini-language is the following.

EOp "or" [e1, e2] as ELet <sup>e</sup><sup>1</sup> (EIf (EVar 0) (EVar 0) (*shift* <sup>0</sup> <sup>e</sup>2)

Here, *shift* n is the standard shifting operator for de Brujin indexed-term that increments the variables that have indices greater than n (these variables are "free" in the given expression). We will program a bidirectional version of the above desugaring process in Figs. 7 and 8, with the particular goal of keeping the result of a backward execution as close as possible to the original sugared form (so that it is not merely a "decompilation" in the sense that the original source has to be consulted).

**Fig. 8.** *desugarB*: bidirectional desugring

We start with an auxiliary function *compos* [4] in Fig. 7, which is a useful building block for defining shifting and desugaring. We have omitted the straightforward exit conditions; they will be inferred as explained in Sect. 5.5. The function *mapB* is the bidirectional map. The reconciliation function *recE* tries to preserves as much source structure as possible by reusing the original source e. Here, *arities* :: [(*Name*,*Int*)] maps operator names to their arities (i.e. *arities* = [("or", 2),("not", 1)]). The function *shift* is the standard uni-directional shifting function. We omit its definition as it is similar to the bidirectional version in Fig. 8. Note that **default** is syntactic sugar for reconciliation function introduced in Sect. 5.4. Here, *incB* is the bidirectional increment function defined in Sect. 5.1. Thanks to *composB*, we only need to define the interesting parts in the definitions of *shiftB* and *desugarB*. The reconciliation

functions *recE* and *toOp* try to keep as much source information as possible, which enables the behaviour that the backwards execution produces "not" and "or" in the sugared form only if the original expression has the sugar.

Consider a sugared expression EOp "or" [EOp "not" [ETrue], EOp "not" [EFalse]] as a source *source*.

```
HOBiT> :get desugarB source
ELet (EIf ETrue EFalse ETrue) (EIf (EVar 0) (EVar 0) (EIf EFalse EFalse ETrue)
{- let x = (if True then False else True)
   in if x then x else (if False then False else True) -}
```
The following updated views may be obtained by reductions from the view.

{- *view*<sup>1</sup> ≡ **let** x = False **in if** x **then** x **else** (**if** False **then** False **else** True) -} *view*<sup>1</sup> = ELet EFalse (EIf (EVar 0) (EVar 0) (EIf EFalse EFalse ETrue) {- *view*<sup>2</sup> ≡ **if** False **then** False **else** (**if** False **then** False **else** True) -} *view*<sup>2</sup> = EIf EFalse EFalse (EIf EFalse EFalse ETrue) {- *view*<sup>3</sup> ≡ **if** False **then** False **else** True -} *view*<sup>3</sup> = EIf EFalse EFalse ETrue

The following are the corresponding backward transformation results.

HOBiT> :put *desugarB source view*<sup>1</sup> EOp "or" [EFalse, EOp "not" [EFalse]] HOBiT> :put *desugarB source view*<sup>2</sup> EIf EFalse EFalse (EOp "not" [EFalse] HOBiT> :put *desugarB source view*<sup>3</sup> EOp "not" [False]

As the AST structure of the view is changed, all of the three cases require branchswitching in the backwards executions; our program handles it with ease. For *view*2, the top-level expression EIf EFalse EFalse ... does not have a corresponding sugared form. Our program keeps the top level unchanged, and proceeds to the subexpression with correct resugaring, a behaviour enabled by the appropriate use of reconciliation function (the first line of *recE* for this particular case) in *composB*.

If we were to present the above results as the evaluation steps in the surface language, one may argue that the second result above does not correspond to a valid evaluation step in the surface language. In [26], AST nodes introduced in desugaring are marked with the information of the original sugared syntax, and resugaring results containing the marked nodes will be skipped, as they do not correspond to any reduction step in the surface language. The marking also makes the backwards behaviour more predictable and stable for drastic changes on the view, as the desugaring becomes injective with this change. This technique is orthogonal to our exploration here, and may be combined with our approach.

### **7 Related Work**

*Controlling Backwards Behaviour.* In addition to *put* ∈ S → V → S, many lens languages [3] supply a *create* ∈ V → S (which is in essence a right-inverse of *get*) to be used when the original source data is unavailable. This happens when new data is inserted in the view, which does not have any corresponding source for *put* to execute, or when branch-switching happens but with no reconciliation function available. Being a right-inverse, *create* does not fail (assuming it terminates), but since it is not guided by the original source, the results are more arbitrary. We do not include *create* in HOBiT, as it complicates the system without offering obvious benefits. Our branch-switching facilities are perfectly capable of handling missing source data via reconciliation functions.

Using exit conditions in branching constructs for backwards evaluation can be found in a number of related fields: bidirectional transformation [7], reversible computation [34] and program inversion [11,20]. Our design of **case** is inspired by the *cond* combinator in the lens framework [7] and the if-statement in Janus [34]. A similar combinator is *Case* in BiGUL [16], where a branch has a function performing a similar role as an exit condition, but taking the original source in addition. This difference makes *Case* more expressive than *cond*; for example, *Case* can implement matching lenses [2]. Our design of **case** follows *cond* for its relative simplicity, but the same underlying technique can be applied to *Case* as mentioned in Sect. 5.2. In the context of *bidirectionalization* [19,29,30] there is the idea of "Plug-ins" [31] that are similar to reconciliation functions in the sense that source values can be adapted to direct backwards execution.

*Applicative Lenses.* The applicative lens framework [21,23] provides a way to use λ-abstraction and function application as in normal functional programming to compose lenses. Note that this use of "applicative" refers to the classical applicative (functional) programming style, and is not directly related to Applicative functor in Haskell. In this sense, it shares a similar goal to us. But crucially, applicative lens lacks HOBiT's ability to allow λ-bound variables to be used freely, and as a result suffers from the same limitation of lens languages. There are also a couple of technical differences between applicative lens and our work: applicative lens is based on Yoneda embedding while ours is based on separating Γ and Δ and having three semantics (Sect. 4); and applicative lens is implemented as an embedded DSL, while HOBiT is given as a standalone language. Embedded implementation of HOBiT is possible, but a type-correct embedding would expose the handling of environment Δ to programmers, which is undesirable.

*Lenses and Their Extensions.* As mentioned in Sect. 1, the most common way to construct lenses is by using combinators [3,7,8], in which lenses are treated as opaque objects and composed by using lens combinators. Our goal in this paper is to enhance the programmability of lens programming, while keeping its expressive power as possible. In HOBiT, primitive lenses can be represented as functions on **B**-typed values (Sect. 5.1), and lens combinators satisfying certain conditions can be represented as language construct with binders (Sect. 5.2), which is at least enough to express the original lenses in [7].

Among extensions of the lens language [2,3,7–9,16,17,27,32], there exists a few that extend the classical lens model [7], namely quotient lenses [8], symmetric lenses [14], and edit-based lenses [15]. A natural question to ask is whether our development, which is based on the classical lenses, can be extended to them. The answer depends on treatment of value environments μ in *get* and *put*. In our semantics, we assume a non-linear system as we can use the same variable in μ any number of times. This requires us to extend the classical lens to allow merging (-) and defaulting () operations in *put* with -well-behavedness, but makes the syntax and type system of HOBiT simple, and HOBiT free from the design issues of linear programming languages [25]. Such extension of lenses would be applicable to some kinds of lens models, including quotient lenses and symmetric lenses, but its applicability is not clear in general. Also, we want to mention that allowing duplications in bidirectional transformation is still open, as it essentially entails multiple views and the synchronization among them.

### **8 Conclusion**

We have designed HOBiT, a higher-order bidirectional programming language in which lenses are represented as functions and lens combinators are represented as language constructs with binders. The main advantage of HOBiT is that users can program in a style similar to conventional functional programming, while still enjoying the benefits of lenses (i.e., the expressive power and well-behavedness guarantee). This has allowed us to program realistic examples with relative ease.

HOBiT for the first time introduces a truly "functional" way of constructing bidirectional programs, which opens up a new area of future explorations. Particularly, we have just started to look at programming techniques in HOBiT. Moreover, given the resemblance of HOBiT code to that in conventional languages, the application of existing programming tools becomes plausible.

**Acknowledgements.** We thank Shin-ya Katsumata, Makoto Hamana and Kazuyuki Asada for their helpful comments on the category theory and denotational semantics, from which our formal discussions originate. The work was partially supported by JSPS KAKENHI Grant Numbers 24700020, 15K15966, and 15H02681.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Dualizing Generalized Algebraic Data Types by Matrix Transposition**

Klaus Ostermann(B) and Julian Jabs

University of T¨ubingen, T¨ubingen, Germany *{*klaus.ostermann,julian.jabs*}*@uni-tuebingen.de

**Abstract.** We characterize the relation between generalized algebraic datatypes (GADTs) with pattern matching on their constructors one hand, and generalized algebraic co-datatypes (GAcoDTs) with copattern matching on their destructors on the other hand: GADTs can be converted mechanically to GAcoDTs by refunctionalization, GAcoDTs can be converted mechanically to GADTs by defunctionalization, and both defunctionalization and refunctionalization correspond to a transposition of the matrix in which the equations for each constructor/destructor pair of the (co-)datatype are organized. We have defined a calculus, *GADT* <sup>T</sup> , which unifies GADTs and GAcoDTs in such a way that GADTs and GAcoDTs are merely different ways to partition the program.

We have formalized the type system and operational semantics of *GADT* <sup>T</sup> in the Coq proof assistant and have mechanically verified the following results: (1) The type system of *GADT* <sup>T</sup> is sound, (2) defunctionalization and refunctionalization can translate GADTs to GAcoDTs and back, (3) both transformations are type- and semantics-preserving and are inverses of each other, (4) (co-)datatypes can be represented by matrices in such a way the aforementioned transformations correspond to matrix transposition, (5) GADTs are extensible in an exactly dual way to GAcoDTs; we thereby clarify folklore knowledge about the "expression problem".

We believe that the identification of this relationship can guide future language design of "dual features" for data and codata.

### **1 Introduction**

The duality between data and codata, between construction and destruction, between smallest and largest fixed points, is a long-standing topic in the PL community. While some languages, such as Haskell, do not distinguish explicitly between data and codata, there has been a "growing consensus" [1] that the two should not be mixed up. Many ideas that are well-known from the data world have counterparts in the codata world. One work that is particularly relevant for this paper are copatterns, also proposed by Abel et al. [1]. Using copatterns,

**Electronic supplementary material** The online version of this chapter (https:// doi.org/10.1007/978-3-319-89884-1 3) contains supplementary material, which is available to authorized users.

the language support for codata is very symmetrical to that for data: Data types are defined in terms of constructors, functions consuming data are defined using pattern matching on constructors; codata types are defined in terms of destructors, functions producing codata are defined using copattern matching on destructors.

Another example of designing dual features for codata is the recently proposed codata version of inductive data types [36]. However, coming up with these counterparts requires ingenuity. The overarching goal of this work is to replace the required ingenuity by a mechanical derivation. A key idea towards this goal has been proposed by Rendel et al. [31], namely to relate the data and codata worlds by refunctionalization [16] and defunctionalization [17,32].

Defunctionalization is a global program transformation to transform higherorder programs into first-order programs. By defunctionalizing a program, higher-order function types are replaced by sum types with one variant per function that exists in the program. For instance, if a program contains two functions of type *Nat* → *Nat*, then these functions are represented by a sum type with two variants, one for each function, whereby the type components of each variant store the content of the free variables that show up in the function definition. Defunctionalized function calls become calls to a special first-order *apply* function which pattern-matches on the aforementioned sum type to dispatch the call to the right function body.

Refunctionalization is the inverse transformation, but traditionally it only works (easily) on programs that are in the image of defunctionalization [16]. In particular, it is not clear how to refunctionalize programs when there is more than one function (like *apply*) that pattern-matches on the same data type. Rendel et al. [31] have shown that this problem goes away when functions are generalized to arbitrary codata (with functions being the special codata type with only one *apply* destructor), because then every pattern-matching function in a program to be refunctionalized can be expressed as another destructor.

The main goal of this work is to extend the de- and refunctionalization correspondence between data and codata to generalized algebraic datatypes (GADTs) [8,40] and their codata counterpart, which we call Generalized Algebraic Codata types (GAcoDTs). More concretely, this paper makes the following contributions.


The remainder of this paper is structured as follows. In Sect. 2 we give an informal overview of our main contributions by means of an example and using conventional concrete syntax. In Sect. 3 we present the syntax, operational semantics, and type system of GADT <sup>T</sup> . Section 4 presents the aforementioned mechanically verified properties of GADT <sup>T</sup> . In Sect. 5, we discuss applications and limitations of GADT <sup>T</sup> , talk about termination/productivity and directions for future work, and describe how we formalized GADT <sup>T</sup> in Coq. Finally, Sect. 6 discusses related work and Sect. 7 concludes.

### **2 Informal Overview**

Figure 1 illustrates the language design of GADT <sup>T</sup> in terms of an example. The left-hand side shows an example using GADTs and functions that patternmatch on GADT constructors. The right-hand side shows the same example using GAcoDTs and functions that copattern-match on GAcoDT destructors. The right-hand side is the refunctionalization of the left hand side; the left-hand side is the defunctionalization of the right-hand side.

*Simply-Typed (Co)Datatypes.* Let us first look at the Nat (co)datatype. Every data or codata type has an *arity*: The number of type arguments it receives. Since GADT <sup>T</sup> does only feature types of kind \*, we simply state the number of type arguments in the (co)data type declaration. Nat receives zero type arguments, hence Nat illustrates the simply-typed setting with no type parameters. Functions in GADT <sup>T</sup> , like add on the left-hand side, are first-order only; higher-order functions can be encoded as codata instead. Functions always (co)pattern-match on their first argument. (Co)pattern matching on multiple argument as well as nested and deep (co)pattern matching are not supported directly and must be encoded via auxiliary functions. We see that the refunctionalized version of Nat on the right-hand side turns constructors into functions, functions into destructors, and pattern matching into copattern matching. Abel et al. [1] use "dot notation" for copattern matching and destructor application; for instance, they

```
data Nat[0] where
  zero(): Nat
  succ(Nat): Nat
function add(Nat,Nat): Nat where
  add(zero(), x) = x
  add(succ(y),x) = succ(add(y,x))
data List[1] where
  nil[A](): List[A]
  cons[A](A, List[A]): List[A]
function length[A](List[A]): Nat w..
  length[_](nil[_]) = 0
  length[B](cons[_](x,xs)) =
    succ(length[B](xs))
function sum(List[Nat]): Nat
  sum(nil[_]) = 0
  sum(cons[_](x,xs)) = x + sum(xs)
data Tree[1] where
  node(Nat): Tree[Nat]
  branch[A](List[Tree[A]])
           : Tree[List[A]]
function unwrap(Tree[Nat]): Nat w..
  unwrap(node(n)) = n
  unwrap(branch[_](xs)) = impossible
function width[A](Tree[A]): Nat w..
  width[_](node(n)) = 0
  width[_](branch[C](xs)) =
    length[C](xs)
                                      codata Nat[0] where
                                        add(Nat,Nat) : Nat
                                      function zero(): Nat where
                                        add(zero(),x) = x
                                      function succ(Nat): Nat where
                                        add(succ(y),x) = succ(add(y,x))
                                      codata List[1] where
                                        length[A](List[A]): Nat
                                        sum(List[Nat]): Nat
                                      function nil[A](): List[A] where
                                        length[_](nil[_]) = 0
                                        sum(nil[_]) = 0
                                      function cons[A](A, List[A]): List[A] w..
                                        length[B](cons[_](x,xs)) =
                                          succ(length[B](xs))
                                        sum(cons[_](x,xs))=x+ sum(xs)
                                      codata Tree[1] where
                                        unwrap(Tree[Nat]) : Nat
                                        width[A](Tree[A]): Nat
                                      function node(Nat): Tree[Nat] where
                                        unwrap(node(n)) = n
                                        width[_](node(n)) = 0
                                      function branch[A](List[Tree[A]])
                                              : Tree [List[A]] where
                                        unwrap(branch[_](xs)) = impossible
                                        width[_](branch[C](xs)) =
                                          length[C](xs)
```
**Fig. 1.** The same example in the data fragment (left) and codata fragment (right)


**Fig. 2.** Matrix representation of List GADT from Fig. 1 (left)


**Fig. 3.** Matrix representation of List GAcoDT from Fig. 1 (right). This matrix is the transposition of Fig. 2.

would write succ(y).add(x) = succ(y.add(x)) instead of add(succ(y),x) = succ(add(y,x)) on the right-hand side of Fig. 1. We use the same syntax for constructor calls, function calls, and destructor calls because then the equations are not affected by de- and refunctionalization.

*Parametric (Co)Datatypes.* The List datatype illustrates the classical special case of GADTs with no indexing. Type arguments of constructors, functions, and destructors are both declared and passed via rectangular brackets [...] (loosely like in Scala). Like System F, GADT <sup>T</sup> has no type inference; all type annotations and type applications must be given explicitly. GADT <sup>T</sup> has a redundant way of binding type parameters. When defining an equation of a polymorphic function with a polymorphic first argument, we use square brackets to bind both the type parameters of the function and of the constructor/destructor on which we (co)pattern-match. For instance, in the equation length[B](cons[ ](x,xs)) = ... on the left hand side, B is the type parameter of the length function, whereas the underscore (which we use if the type argument is not relevant, we could replace it by a proper type variable name) binds the type argument of the constructor with which the list was created. In this example, we could have also written the equation as length[ ](cons[B](x,xs)) = ... because both type parameters must necessarily be the same, but in the general case we need access to both sets of type variables (as the next example will illustrate). It is important that we do not (co)pattern-match on type arguments, since this would destroy parametricity; rather, the [...] notation on the left hand side of an equation is only a binding construct for type variables.

Codatatypes also serve as a generalization of first-class functions. The code below shows how a definition of a general function type together with a specific family of first-class function addn (that can be passed as an argument and returned as a result), defined by a codata generator function with return type Function[Nat,Nat].

```
codata Function[2] where
  apply[A,B](Function[A,B], A): B
function addn(Nat): Function[Nat,Nat] where
  apply(addn(n),m) = add(n,m)
```
*Type Parameter Binding.* Of those two sets of type parameter bindings, the one for functions is in a way always redundant because we could use the type variable declaration inside the function declaration instead. For instance, in the equation length[B](cons[ ](x,xs)) = succ(length[B](xs)) on the left hand side we could use the type parameter A of the enclosing function declaration instead. However, in GADT <sup>T</sup> the scope of the type variables in the function declaration does not extend to the equations and the type arguments must be bound anew in every equation. The reason for that is that we want to design the equations in such a way that they do not need to be touched when de/refunctionalizing a (co)datatype. For instance, when refunctionalizing a datatype, a function declaration is turned into a destructor declaration and what used to be a type argument that was bound in the enclosing function declaration becomes a type argument that is bound in a remote destructor declaration; to make typechecking modular we hence need a local binding construct. Our main goal in designing GADT <sup>T</sup> was not to make it convenient for programmers but to make the relation between GADTs and GAcoDTs as simple as possible; furthermore, a less verbose surface syntax could easily be added on top.

If we look at the corresponding List codatatype on the right-hand side, we see that the sum function from the left-hand side, which accepts only a list of numbers, turns into a destructor that is only applicable to those instances of List whose type parameter is Nat. This is similar to methods in objectoriented programming whose availability depends on type parameters [28], but here we see that this feature arises "mechanically" by the de/refunctionalization correspondence.

*GA(co)DTs.* The Tree (co)datatype illustrates a usage of GA(co)DTs that cannot be expressed with traditional parametric data types. We can see that by looking at the return type of the constructors of the Tree datatype; they are Tree[Nat] and Tree[List[A]] instead of Tree[A]. The Tree codatatype is also using the power of GAcoDTs in the unwrap destructor<sup>1</sup> because its first argument is different from Tree[A]. The GADT constructor node(Nat): Tree[Nat] turns into a function that returns a Tree[Nat] on the right hand side. The Tree example illustrates two additional issues that did not show up in the earlier examples.

First, it illustrates that type unification may make some pattern matches impossible, as illustrated by the unwrap(branch[ ](xs)) = impossible equation on the left hand side. The equation is impossible, because the function argument type Tree[Nat] cannot be unified with the constructor return type Tree[List[A]]. <sup>2</sup> In GADT <sup>T</sup> , we require that pattern matching is always complete, but impossible equations are not type-checked; the right-hand side can hence be filled with any dummy term. Second, the equation width[ ](branch[C] (xs)) = length[C](xs) illustrates the case where it is essential that we can bind constructor type arguments; otherwise we would have no name for the type argument we need to pass to length. Such type arguments are sometimes called *existential* or *phantom* [8] because if we have a branch of type Tree[A], we only know that there exists some type that was used in the invocation of the branch constructor, but that type does not show up in the structure of Tree[A].

We see again how both impossible equations and the need to access constructor type arguments translate naturally into corresponding features in the codata world. For impossible equations, we need to check whether the first destructor argument type can be unified with the function return type. Access to existential

<sup>1</sup> The unwrap destructor is meant to be used to extract the number from a tree that directly contains a number, i.e., a tree constructed with constructor node.

<sup>2</sup> This fits with our intention that unwrap should only work on a node (which directly contains a number).

constructor type arguments turns into access to local function types; conversely, access to existential destructor type arguments in the codata world turns into access to local function type arguments.

*GADT = GAcoDT*<sup>T</sup> *.* We can see that the relation between GADTs and GAcoDTs is as promised when looking at Figs. 2 and 3. These two figures show a slightly different representation of the List (co)datatype and associated functions from Fig. 1. In this presentation, we have dropped all keywords from the language, such as function, data and codata. The reason for dropping these keywords is that now function signatures in the data fragment look the same as destructor signatures in the codata fragment, and constructor signatures in the data fragment look the same as function signatures in the codata fragment. Figure 2 organizes the datatype in the form of a matrix: the first row lists the datatype and its constructor signatures, the first column lists the signatures of the functions that pattern-match on the datatype, the inner cells represent the equations for each combination of constructor and function. Figure 3 does the same for the List codatatype: The first row lists the codatatype and its destructor signatures, the first column lists the signatures of functions that copattern-match on the codatatype, the inner cells represent the equations for each combination of function and destructor. We can now see that the relation between GADTs and GAcoDTs is now indeed rather simple: It is just matrix transposition.

An essential property of this transformation is that other (co)datatypes and functions are completely unaffected by the transformation. For instance, the Tree datatype (or codatatype, regardless of which version we use) looks the same, regardless of whether we encode List in data or in codata style. Defunctionalization and refunctionalization are still global transformations in that we need to find all functions that pattern-match on a datatype (for refunctionalization) or find all functions that copattern-match on a codatatype (for defunctionalization), but the rest of the program, including all clients of those (co)datatypes and functions, remain the same.

*Infinite Codata, Termination, Productivity.* The semantics of codata is usually defined via greatest fixed point constructions that include the possibility to represent "infinite" structures, such as streams. This is not the focus of this work, but since our examples so far did not feature such "infinite" structures but we do not want to give the impression that our codata types do somehow lack the expressiveness to express streams and the like, hence we show here an example of how to encode a stream of zeros, both in the codata representation (left) and, defunctionalized, in the data representation (right).


Codata is also often associated with guarded corecursion to ensure productivity. In the copattern formulation of codata, productivity and termination coincide [2]. Due to our unified treatment of data and codata, a single check is sufficient for both termination/productivity of programs. In Sect. 5.3, we discuss a simple syntactic check that corresponds to both structural recursion and guarded corecursion.

*Properties of* GADT <sup>T</sup> *.* In the remainder of this paper, we formalize GADT <sup>T</sup> in a style similar to the matrix representation of (co)datatypes we have just seen. We define typing rules and a small-step operational semantics and prove formal versions of the following informal theorems: (1) The type system of GADT <sup>T</sup> is sound (progress and preservation), (2) Defunctionalization and refunctionalization (that is, matrix transposition) of (co)datatypes preserves well-typedness and operational semantics, (3) Both types of matrices are modularly extensible in one dimension, namely by adding more rows to the matrix. This means that we can modularly add constructors or destructors and their respective equations without breaking type soundness as long as the new equations are sound themselves.

### **3 Formal Semantics**

We have formalized GADT <sup>T</sup> and all associated theorems and proofs in Coq<sup>3</sup>. Here we present a traditional representation of the formal syntax using contextfree grammars, a small-step operational semantics, and a type system.

We have formalized the language in such a way that we abstract over the physical representation of matrices as described in the previous section, hence we do not need to distinguish between GADTs and GAcoDTs. In the following, we say *constructor* to denote either a constructor of a datatype, or a function that copattern-matches on a codatatype. We say *destructor* to denote either a function that pattern-matches on a datatype, or a destructor of a codatatype. The language is defined in terms of constructors and destructors; we will later see that GADTs and GAcoDTs are merely different organizations of destructors and constructors.

### **3.1 Language Design Rationale**

Our main goal in the formalization is to clarify the relation between GADTs and GAcoDTs, and not to design a calculus that is convenient to use as a

<sup>3</sup> Full Coq sources are available in the supplemental material.

programming language. Hence we have left out many standard features of programming calculi that would have made the description of that relation more complicated. In particular:


### **3.2 Notational Conventions**

As usual, we use the same letters for both non-terminal symbols and metavariables, e.g., t stands both for the non-terminal in the grammar for terms but inside inference rules it is a meta-variable that stands for any term. We use the notation <sup>t</sup> to denote a list <sup>t</sup>1, t2,...,t|t|, where <sup>|</sup>t<sup>|</sup> is the length of the list. We also use list notation to denote iteration, e.g., P, Γ <sup>t</sup> : <sup>T</sup> means P, Γ <sup>t</sup><sup>1</sup> : <sup>T</sup>1,..., P, Γ <sup>t</sup>|t<sup>|</sup> : <sup>T</sup>|t|. To keep the notation readable, we write <sup>x</sup> : <sup>T</sup> instead of x : T to denote x<sup>1</sup> : T1,...,x<sup>n</sup> : Tn.

We use the notation t[x := t ] to denote the substitution of all free occurrences of x in t by t , and similarly T[X := T ] and t[X := T ] for the substitution of type variables in types and terms, respectively.

### **3.3 Syntax**

The syntax of GADT <sup>T</sup> is defined in Fig. 4. Types have the form m[T], where m is the name of a GADT or GAcoDT (in the following referred to as *matrix name*), and square brackets to denote type application. Types can contain type variables X. In the syntax of terms t, x denotes parameters that are bound by (co)pattern matching and y denotes other parameters. A constructor call c[T](t) takes zero or


$$\frac{P \vdash t \to t'}{P \vdash E[t] \to E[t']} \tag{E.\text{-Crx}}$$

$$\begin{array}{c} m \mapsto (a, \overline{C}, \overline{D}, lookup) \in P \\ D \in \overline{D} & D = d[\dots \cdot] (m[\dots], \dots) \\ C \in \overline{C} & C = c[\dots] (\dots) \\ lookup (C, D) = d[\overline{Y}] (c[\overline{X}](\overline{x}), \overline{y}) = t \\ \hline P \vdash d[\overline{S}] (c[\overline{T}](\overline{v}), \overline{u}) \to t[\overline{X} := \overline{S}, \overline{Y} := \overline{T}] [\overline{x} := \overline{v}, \overline{y} := \overline{u}] \end{array} \tag{E-F12}$$

#### **Fig. 4.** Syntax and operational semantics of *GADT* <sup>T</sup>

more arguments, whereas a destructor call d[T](t,t) takes at least one argument (namely the one to be destructed). Both destructors and constructors can have type parameters, which must be passed via square brackets.

A constructor signature c[X](T) : m[T] defines the number and types of parameters and the type parameters to the constructed type. Its output type cannot be a type variable but must be some concrete matrix type m[T]. A destructor signature, on the other hand, must have a concrete matrix type as its first argument and can have an arbitrary return type. Equations d[Y ](c[X](x), y) = t define what happens when a constructor c meets a destructor d. The x bind the components of the constructor, whereas the y bind the remaining parameters of the destructor call. We also bind both the type arguments to the constructor X

and the destructor Y , such that they can be used inside t. In many cases, the X will provide access to the same types as Y , but in the general case we need both because both constructors and destructors may contain phantom types [8].

Matrices M are an abstract representation of both GADTs and GAcoDTs, together with the functions that pattern-match (for GADTs) or copattern-match (for GAcoDTs) on the GA(co)DTs. A matrix has an arity a (the number of type parameters it receives), a list of constructors γ, and a list of destructors δ. It also has a lookup function that returns an equation for every constructor/destructor pair on which the matrix is defined (hence the type of matrices is a dependent type). There must be an equation for each constructor/destructor pair, but in the case of impossible combinations, the equations are not type-checked and some dummy term can be inserted. A program P is just a finite mapping from matrix names to matrices.

### **3.4 Operational Semantics**

We define the operational semantics, also in Fig. 4, via an evaluation context <sup>E</sup>, which, together with E-Ctx, defines a standard call-by-value left-to-right evaluation order. Not surprisingly, the only interesting rule is E-Fire, which defines the reduction behavior when a destructor meets a constructor. We look up the corresponding matrix in the program and look up the equation for that constructor/destructor pair. In the body of the equation, t, we perform two substitutions: (1) We substitute the formal type arguments X and Y by the current type arguments S and T, and (2) we substitute the pattern variables x by the components v of the constructor and the variables y by the current arguments u.

### **3.5 Typing**

The typing and well-formedness rules are defined in Fig. 5. Let us first look at the typing of terms. The rules for variable lookup are standard. The constructor rule T-Const checks that the number of type- and term arguments matches the declaration and checks the type of all arguments, whereby the type variables are substituted by the type arguments of the actual constructor call. Constructor names must be globally unique, hence the matrix to which the constructor belongs is not relevant.

This is different for typing destructor calls (T-Dest). A destructor is resolved by first determining the matrix m of the first destructor argument, and then the destructor is looked up in that matrix. It is hence OK if the same destructor name shows up in multiple matrices. When considering codata as "objects" like in object-oriented programming [24], this corresponds to the familiar situation that different classes can define methods with the same name. In the GADT case, this corresponds to allowing multiple pattern-matching functions of the same name that are disambiguated by the type of their first argument.

In Wf-Eq, we construct the appropriate typing context to type-check the right hand side of equations. We allow implicit α-renaming of type variables

$$
\Gamma ::= \epsilon \quad | \quad x:T, F \quad | \quad y:T, F \
Typing \colon Contarts \ x
$$

$$\begin{array}{lcl} \frac{x:T\in\varGamma}{P,\varGamma\vdash x:T} \begin{array}{lcl} & P,\varGamma\vdash t:m[\overline{T}]\,\overline{\langle X\rangle}:=\overline{S} \\ \mathbf{T}\in\mathbf{P\lor n} \end{array} & m\mapsto \left(\,,\ldots,\ldots,d[\overline{X}]\,(m[\overline{T}],\overline{U}):T,\ldots,\ldots\right)\in P\\ \frac{y:T\in\varGamma}{P,\varGamma\vdash y:T} \begin{array}{lcl} & \forall i.P,\varGamma\vdash t\_{i}:U\_{i}[\overline{X}:=\overline{S}]\\ \overline{X}|\,=|\overline{S}|\quad\overline{|U|}=\overline{S}\end{array}\\ \end{array} & \begin{array}{lcl} & \forall i.P,\varGamma\vdash t\_{i}:U\_{i}[\overline{X}:=\overline{S}]\\ \mathbf{T}\in\mathbf{S}\,\mathbf{T} \end{array} \\ \end{array}$$

$$\begin{array}{r@{}l} \dots \mapsto \left( \dots c[\overline{X}](\overline{T}) : T \dots, \dots, \dots \right) \in P\\ \forall i. P, \Gamma \vdash t\_i : T\_i[\overline{X} := \overline{S}]\\ |\overline{X}| = |\overline{S}| \qquad |\overline{T}| = |\overline{t}|\\ \hline P, \Gamma \vdash c[\overline{S}](\overline{t}) : T[\overline{X} := \overline{S}] \end{array} \tag{T-\text{Consstr}}\begin{array}{c} \\ \text{( $\Gamma$ -Consstr)}\\ \hline \end{array}$$

$$\begin{array}{c c} C = c[\overline{X'}](\overline{T}) : m[\overline{S}] & |\overline{X}| = |\overline{X'}|\\ D = d[\overline{Y'}](m[\overline{S'}], \overline{T'}) : T & |\overline{Y}| = |\overline{Y'}|\\ all\text{-}distinct(\overline{X}, \overline{Y}) & all\text{-}distinct(\overline{X'}, \overline{Y'})\\ most\text{-}general\text{-}uniform(m[\overline{S}], m[\overline{S'}]) & = \sigma\\ \hline P, \overline{x}:\sigma(\overline{T}), \overline{y}:\sigma(\overline{T'}) \vdash \sigma(t[\overline{X} := \overline{X'}, \overline{Y} := \overline{Y'}]) : \sigma(T)\\ \hline P, m \vdash d[\overline{Y}](c[\overline{X}](\overline{x}), \overline{y}) = t\text{ OK in } C, D\\ \sigma & \sigma \overline{D} = D \quad \text{-}\sigma \quad \text{-}\sigma \end{array} \tag{WF-Eq}$$

$$\begin{array}{l} C = \dots : m[\mathbf{S}] \qquad D = \dots (m[\mathbf{S}'], \dots) : \dots \\\text{most-general-unifier}(m[\overline{\mathbf{S}}], m[\overline{\mathbf{S}'}]) = \text{error} \\\hline P, m \vdash d[\overline{Y}](c[\overline{X}](\overline{x}), \overline{y}) = t \text{ OK in } C, D \end{array} \qquad\qquad \text{(Wr-InFSBLE)}$$

$$\frac{|\overline{S}| = a \qquad FV(\overline{T}) \subseteq \overline{X}}{c[\overline{X}](\overline{T}) : m[\overline{S}] \text{ OK in } m, a} \qquad \text{(W $\mathbb{F}$ -ConsSTR)}$$

$$\frac{|\overline{S}| = a \qquad FV(\overline{S}) \subseteq \overline{Y} \qquad FV(\overline{T}) \subseteq \overline{Y}}{d[\overline{Y}](m[\overline{S}], \overline{T}) : T \text{ OK in } m, a} \tag{\text{Wr-Desr}}$$

$$\begin{array}{l} \forall C \in \overline{C}, \forall D \in \overline{D}, \\ \qquad C \text{ OK in } m, a \\ \qquad D \text{ OK in } m, a \\ \qquad P, m \vdash lookup(C, D) \text{ OK in } C, D \\ \qquad all\text{-}names\text{-}distinct(\overline{D}) \\ \qquad m \mapsto (a, \overline{C}, \overline{D}, lookup) \text{ OK in } P \\ \end{array} \tag{W\text{-}F\text{-}M\text{-}m}$$
  $\forall m \in dom(P), m \mapsto P(m) \text{ OK in } P$  
$$\begin{array}{l} \forall m \in dom(P), m \mapsto P(m) \text{ OK in } P \\ \qquad all\text{-}names\text{-}distinct(\text{-}tors(P)) \end{array}$$

#### **Fig. 5.** Typing and well-formedness

to prevent accidental name clashes (checked by *all-distinct*). We compute the most general unifier of the two matrix types in the constructor and destructor, respectively, to combine the type knowledge about the matrix type from the constructor and destructor type. If no such unifier exists, the equation is vacuously well-formed because the particular combination of constructor and destructor can never occur during execution of well-typed terms (Wf-Infsble). Otherwise, we use the unifier σ and apply it to the given type annotations to type-check the term t. A unifier σ is a mapping from type variables to types, but we also use the notation σ(t) and σ(T) to apply σ to all occurrences of type variables inside a term t or a type T, respectively.

Constructor and destructor signatures are well-formed if they apply the correct number of type parameters to the matrix type and contain no free type variables (Wf-Constr and Wf-Destr). A matrix is type-checked by making sure that all constructor and destructor signatures are well-formed, that all equations are well-formed for every constructor/destructor combination, and that destructor names are unique in the matrix (Wf-Matr). To check uniqueness of names, we use *all-names-distinct*, which checks for a given list of signatures that all of their names are distinct. A program is well-formed if all of its matrices typecheck and the constructor signatures of the program (retrieved by *ctors*) are globally unique (Wf-Prog).

### **3.6 GADTs and GAcoDTs**

In the formalization so far, we have deliberately kept matrices abstract as a kind of abstract data type. Now we can bring in the harvest of our language design. GADTs and GAcoDTs are two different physical representations of matrices, see Fig. 6. They both contain nested vectors of equations and differ only in the order of the indices. With GADTs, the column labels are constructors and the row labels functions and a row corresponds to a function defined by pattern matching, with one equation for each case of the GADT. With GAcoDTs, the column labels are destructors, the row labels are functions, and a row corresponds to a function defined by copattern matching, with one equation for each case of


**Fig. 6.** GADTs and GAcoDTs

the GAcoDT. Hence both *defunctionalize* and *refunctionalize*, which swap the respective organization of the matrix, are just matrix transposition.

### **4 Properties of** *GADT <sup>T</sup>*

In this section, we prove type soundness for GADT <sup>T</sup> , the preservation of typing and operational semantics under de- and refunctionalization, and that our physical matrix representations of GADTs and GAcoDTs are accurate with respect to extension. All of these properties have been formalized and proven in Coq, based upon our Coq formalization of the previous section's formal syntax, semantics, and type system.

### **4.1 Type Soundness**

We start with the usual progress and preservation theorems.

**Theorem 1 (Progress).** *If* P *is a well-formed program and* t *is a term with no free type variables and* P, <sup>t</sup> : <sup>T</sup>*, then* <sup>t</sup> *is either a value* <sup>v</sup>*, or there exists a term* t *such that* <sup>P</sup> <sup>t</sup> <sup>→</sup> <sup>t</sup> *.*

The proof of this theorem is a simple induction proof using a standard canonical forms lemma [30].

Preservation is much harder to prove. Often, preservation is proved using a substitution lemma which states that the substitution of a (term) variable by a term of the same type does not change the type of terms containing that term variable [30]. In GADT <sup>T</sup> , this lemma looks as follows:

**Lemma 1 (Term Substitution).** *If* <sup>t</sup> *is a list of terms with* P, <sup>t</sup> : <sup>T</sup> *and* <sup>t</sup> *is a list of terms with* P, <sup>t</sup> : <sup>T</sup> *and* <sup>t</sup> *is a term with* P, <sup>x</sup> : T, <sup>y</sup> : <sup>T</sup> <sup>t</sup> : <sup>T</sup>*, then* P, <sup>t</sup>[<sup>x</sup> := t, <sup>y</sup> := <sup>t</sup> ] : T

However, in E-Fire we perform both a substitution of terms and of types, hence the term substitution lemma is not enough to prove preservation; we also need a type substitution lemma.

**Lemma 2 (Type Substitution).** *If* P, Γ <sup>t</sup> : <sup>T</sup>*, then* P, Γ[<sup>X</sup> := <sup>T</sup>] <sup>t</sup>[<sup>X</sup> := T] : T[X := T]

The proof of this lemma requires various auxiliary lemmas about properties (such as associativity) of type substitution. Taken together, these two lemmas are the two main intermediate results to prove the desired preservation theorem.

**Theorem 2 (Preservation).** *If* P *is a well-formed program and* t *is a term with no free type variables and* P, <sup>t</sup> : <sup>T</sup> *and* <sup>P</sup> <sup>t</sup> <sup>→</sup> <sup>t</sup> *, then* P, <sup>t</sup> : T*.*

### **4.2 Defunctionalization and Refunctionalization**

The preservation of typing and operational semantics by de/refunctionalization is a trivial consequence of the lemma below, which holds due to the fact that both de- and refunctionalization is merely matrix transposition, see Fig. 6, and that the embedding *mkmatrix* of the physical matrices into the abstract representation ignores the organization of the physical matrices.

### **Lemma 3 (Matrix Transposition)**

<sup>∀</sup><sup>m</sup> <sup>∈</sup> <sup>M</sup>*GADT , mkmatrix* (m) = *mkmatrix* (*refunctionalize*(m))*.* <sup>∀</sup><sup>m</sup> <sup>∈</sup> <sup>M</sup>*GAcoDT , mkmatrix* (m) = *mkmatrix* (*defunctionalize*(m))*.*

**Corollary 1 (Preservation of typing and reduction).** *De/refunctionalization of a matrix does not change the well-typedness of a program or the operational semantics of a term.*

### **4.3 Extensibility**

So far, we have seen that our chosen physical matrix representations are amenable to easy proofs of the preservation of properties under de- and refunctionalization. However, are they also indeed accurate representations of GADTs and GAcoDTs? GADTs and GAcoDTs are utilized due to their *extensibility* along the destructor or constructor dimension, respectively, so we want this to be reflected by our representations.

We assume that matrices are represented as a traditional linear program by reading them row-by-row. Adding a new row is a non-invasive operation (adding to the program), whereas adding a column requires changes to the existing program.

We want to be able to extend our matrix representations with a new row, respectively representing the addition of a new destructor or constructor, without breaking well-typedness as long as the *newly added* equations typecheck with respect to the complete new program, and uniqueness of destructor/constructor names is preserved (globally, in the constructor case)<sup>4</sup>.

In order to formally state that this is indeed the case, we first formally capture extension of GADT and GAcoDT matrices with the following definitions. These already include the preservation of local uniqueness as a condition, i.e., the name of the newly added destructor or constructor must be fresh within the matrix.

**Definition 1 (GADT extension).** *Consider an* <sup>m</sup> <sup>∈</sup> <sup>M</sup>*GADT with* <sup>m</sup> <sup>=</sup> (a, γ, δ, {eD,C <sup>|</sup><sup>D</sup> <sup>∈</sup> δ, C <sup>∈</sup> <sup>γ</sup>})*. For any* <sup>D</sup> <sup>∈</sup> D, D ∈ <sup>δ</sup>*, and equations* <sup>e</sup><sup>D</sup>-,C *, for each* <sup>C</sup> <sup>∈</sup> <sup>γ</sup>*, we call* (a, γ, δ ∪ {D }, {eD,C <sup>|</sup><sup>D</sup> <sup>∈</sup> <sup>δ</sup> ∪ {D }, C <sup>∈</sup> <sup>γ</sup>}) *<sup>a</sup>* GADT extension of <sup>m</sup> *with* <sup>D</sup> *and* {e<sup>D</sup>-,C <sup>|</sup><sup>C</sup> <sup>∈</sup> <sup>γ</sup>}*.*

<sup>4</sup> The counterpart to this property on the side of the operational semantics is that the reduction relation of the new program restricted to terms befitting the old program equals the reduction relation of the old program; this however we omitted as it holds trivially when uniqueness is preserved.

**Definition 2 (GAcoDT extension).** *Consider an* <sup>m</sup> <sup>∈</sup> <sup>M</sup>*GAcoDT with* <sup>m</sup> <sup>=</sup> (a, γ, δ, {eC,D|<sup>C</sup> <sup>∈</sup> γ,D <sup>∈</sup> <sup>δ</sup>})*. For any* <sup>C</sup> <sup>∈</sup> C,C ∈ <sup>γ</sup>*, and equations* <sup>e</sup>C-,D*, for each* <sup>D</sup> <sup>∈</sup> <sup>δ</sup>*, we call* (a, γ ∪ {C }, δ, {eC,D|<sup>C</sup> <sup>∈</sup> <sup>γ</sup> ∪ {C }, D <sup>∈</sup> <sup>δ</sup>}) *<sup>a</sup>* GAcoDT extension of <sup>m</sup> *with* <sup>C</sup> *and* {eC-,D|<sup>D</sup> <sup>∈</sup> <sup>δ</sup>}*.*

We now straightforwardly lift these definitions to programs: A program P *is a GA(co)DT extension (with some signature and equations) of another program* P if their matrices are identical except for one matrix name, and the underlying physical matrix (packed with *mkmatrix* ) assigned to this name under P is GA(co)DT extension (with this signature and equations) of the underlying physical matrix assigned under P.

Using this terminology we can now formally state and prove the extensibility of GADTs and GAcoDTs:

**Theorem 3 (Datatype Extensibility).** *If* P *is a well-formed program, and* <sup>P</sup> *is a GADT extension of* <sup>P</sup> *with* <sup>D</sup> *and equations* {e<sup>D</sup>-,C <sup>|</sup><sup>C</sup> <sup>∈</sup> <sup>γ</sup>}*, for the constructor signatures* γ *of the matrix to be extended, such that* P , m e<sup>D</sup>-,C *OK in C,D' for each* <sup>C</sup> <sup>∈</sup> <sup>γ</sup>*, then* <sup>P</sup> *is well-formed.*

**Theorem 4 (Codatatype Extensibility).** *If* P *is a well-formed program, and* P *is a GAcoDT extension of* P *with* C *, where the name of* C *is different from all constructor names in* <sup>P</sup>*, and equations* {e<sup>C</sup>-,D|<sup>D</sup> <sup>∈</sup> <sup>δ</sup>}*, for the destructor signatures* δ *of the matrix to be extended, such that* P , m <sup>e</sup><sup>C</sup>-,D *OK in C',D for each* <sup>D</sup> <sup>∈</sup> <sup>δ</sup>*, then* <sup>P</sup> *is well-formed.*

In other words, in both cases we can type-check each row of a matrix in isolation, and if we put those rows together the resulting matrix and program containing that matrix will be well-formed. The results justify the familiar physical representation of programs where the variants of a GADT are fixed but we can freely add new functions that pattern-match on that GADT (and correspondingly for GAcoDTs).

### **5 Discussion**

In this section we discuss applications and limitations of our work, talk about directions for future work, and describe the Coq formalization of the definitions and proofs.

### **5.1 Applications**

*Language Design.* The most obvious application of our approach is to guide programming language design, namely by designing its features in such a way that the correspondence by de/refunctionalization is preserved. We believe that we can find "gaps" in existing languages by checking whether the corresponding dual feature exists, or massaging the language feature in such a way that a clear dual exists. For instance, on the datatype and pattern matching side, many features exist that have no clear counterpart on the codata side yet, such as pattern matching on multiple arguments, non-linear pattern matching, or pattern guards [22]. Some vaguely dual features exist on the codata side understood as "objects", e.g. in the form of multi dispatch (such as [10]) or predicate dispatch [21]. We believe that the relation between pattern matching on multiple arguments and multi dispatch is a particularly interesting direction for future work, since it would entail generalizing our two-dimensional matrices to matrices of arbitrary dimension.

Arguably, codata is the essence of object-oriented programming [12]. In any case, we believe that our design can also help to design object-oriented language features. For instance, there has been previous works on "object-oriented" GADTs [20,26] using extensions of generic types with certain classes of constraints. For instance, in Kennedy and Russo's [26] work, a list interface could be defined like this:

```
interface List<A> {
  Integer size();
  Integer sum() where A=Integer; // Kennedy & Russo's syntax
}
```
If we compare this interface with the List codata type in Fig. <sup>1</sup> (right hand side), then we can see that such constraints are readily supported by GAcoDTs; not because this feature was explicitly added but because it arises mechanically from dualizing GADTs.

As another potential influence on language design, we believe that "closedness" under defunctionalization and refunctionalization can be a desirable language design quality that prevents oddities that things can be expressed better using codata than using data (or vice versa). For instance, Carette et al. [5] propose a program representation (basically again a form of Church encoding, hence a codata encoding) that works in a simple Haskell'98 language but whose datatype representation would require GADTs. This suggests a language design flaw in that the codata fragment of functions supports a more powerful type system than the data fragment of (non-generalized) algebraic data types. That is, the type arguments of a codata generator function's result type may be arbitrarily specialized, e.g., the result type might be List[Nat], while the type of a constructor must be fully generic, e.g., List[A]. Our approach gives a criterion on when the type systems for both sides are "in sync".

*De/Refunctionalization as a Programmer Tool.* Semantics-preserving program transformations are not only interesting on the meta-level of programming language design but also because they define an equivalence relation on programs. For instance, consider the program on the left-hand side of Fig. 7, written in our GAcoDT language. Nat is a representation of Church-encoded<sup>5</sup> natural numbers as a GAcoDT with arity zero and a singular destructor fold with a type

<sup>5</sup> This form of typed Church encoding is sometimes called B¨ohm-Berarducci encoding [4].

```
codata Func[2] where
  apply[A,B](Func[A,B], A) : B
codata Nat[0] where
  fold[A](Nat,A,Func[A,A]) : A
fun zero(): Nat where
  fold[A](zero(),z,s) = z
fun succ(Nat): Nat where
  fold[A](succ(n),z,s) =
     apply[A,A](s,fold[A](n,z,s))
                                      data Nat[0] where
                                         zero() : Nat
                                         succ(Nat) : Nat
                                      fun fold[A](Nat,A,Func[A,A]) : A where
                                         fold[A](zero(),z,s) = z
                                         fold[A](succ(n),z,s) =
                                            apply[A,A](s, fold[A](n,z,s))
```
**Fig. 7.** Defunctionalizing Church-encoded numbers (left) yields Peano numbers with a fold function (right)

parameter A. Defunctionalizing Nat yields the familiar Peano numbers with the standard fold function (right-hand side).

Such equivalences have been identified as being useful to identify different forms of programs that are "the same elephant". For instance, Olivier Danvy and associates [16,17] have used defunctionalization, refunctionalization, and some other transformations such as CPS-transformation to inter-derive "semantic artifacts" such as big-step semantics, small-step semantics, and abstract machines ("The inter-derivations illustrated here witness a striking unity of computation, be this for reduction semantics, abstract machines, and normalization function: they all truly define the same elephant." – Danvy et al. [15]).

The applicability of these transformations is widened by our approach since we support arbitrary codata and not just functions. Exploring these new possibilities is an interesting area of future work.

Furthermore, programmers can employ our transformation as a tool for a more practical purpose. Consider that at some point during the development of a large software, it might have been determined that the extensibility dimension for a particular aspect should be switched. That is, it is now thought that instead of allowing to add new variants (constructors), the software would be better poised by fixing the variants and allowing the addition of new operations (destructors), or vice versa. In the case that at this point it is further possible to make a closed-world assumption with regards to the particular type (represented as a matrix), since clients of the code are known and can be dealt with, it might seem reasonable to transpose the matrix representing that type. With GADT <sup>T</sup> , it is possible to do this independently of the other matrices in the program. (As already discussed, GADT <sup>T</sup> in its present form doesn't aim to be particularly developer-friendly, but we expect further language layers to be placed on top of GADT <sup>T</sup> to remedy this eventually.)

*Compiler Optimizations.* To be able to use our automatizable transformation as a programmer tool, it was important to be able to make a closed-world assumption, where we have the entire program, or more precisely, the part which involves the matrix under consideration, at our disposal. A more automated process where such a kind of assumption can often be readily made is compilation. There, our matrix transposition transformation can be employed for a whole program optimization (such as [6]), as follows. An opportunity for optimization presents itself to the compiler when it is basically able to recognize an abstract machine in the code; optimizing this abstract machine is then an intermediate step, more generally applicable, that precedes hardware-specific optimizations [18]. As outlined above, defunctionalization can turn higher-order programs into first-order programs where this machine might be apparent. With our pair of languages, using our readily automatizable defunctionalization (matrix transposition), it is possible to turn GAcoDT code into GADT code during the compilation phase. Then the compiler can leverage the potentially recognizable abstract machine form of the GADT code for its optimizations.

### **5.2 Limitations**

As we said, our design rationale for GADT <sup>T</sup> was to clarify the relation between GADTs and GAcoDTs, not to provide a convenient language for developers. Here we discuss some ways to address the limitations resulting from that decision.

*Local (Co)Pattern Matching, Including* λ*.* A significant limitation of GADT <sup>T</sup> is that (co)pattern matching is only allowed on the top-level; we don't have "case" (or "match") constructs on the term level. Any local (co)pattern matching, however, can be converted to the top-level form by extracting it to a new top-level function definition. Variables free within the (co)pattern matching term must be passed to this function as arguments. In particular, anonymous local first-class functions, i.e., λ expressions, are a form of local copattern matching which can be encoded in this way; this particular conversion is traditionally called lambda lifting.

*(Co)Pattern Matching on Zero or More Arguments.* (Co)pattern matching in GADT <sup>T</sup> is only possible on a single, distinguished argument (in our presentation, the first, but this is not important). Nested and multiple-argument matching can be encoded by *unnesting* `a la Setzer et al. [35], producing auxiliary functions.

In GADT <sup>T</sup> , it is further not possible to define a function without any (co) pattern matching entirely. The workaround of (co)pattern matching on a dummy argument of type Unit is simple, but it is not obvious how to reconcile this encoding with the symmetry of de/refunctionalization.

*Type Inference.* We have deliberately avoided the question of type inference in this work. In general, we expect that the ample existing works on type inference for GADTs (such as Peyton Jones et al. [29], Schrijvers et al. [34], Chen and Erwig [7]) can be adapted to our setting and will also work for GAcoDTs. We see one complication, though: Due to the fact that destructors are only locally unique in GADT <sup>T</sup> , the (co)datatype the destructor belongs to must first be found via the type inferred for its distinguished, destructed argument. In other words, we do not know which destructor signature to consider before we know the destructed argument's type. This means that a type inference system which works inwards only, i.e., it discovers the types of the destructor arguments by looking at the signature, possibly leaving unification variables, and then checks that the recursively discovered types for the arguments conform, will not work.

#### **5.3 Termination and Productivity**

While termination and productivity are not in the focus of this paper, we want to mention that our unified treatment of data and codata can also lead to a unified treatment of termination and productivity.

Here we want to illustrate informally that a simple syntactic criterion is sufficient to allow structural recursion and guarded corecursion. Syntactic termination checks are not expressive enough for many situations, hence we leave a proper treatment of termination/productivity checking (such as with sized types [2]) for future work; the purpose of this discussion is merely to illustrate that termination checking could also benefit from unified data and codata and not to propose a practically useful termination checker.

The basic idea is to restrict destructor calls in the right-hand sides of equations to have the form d[T](x,t) instead of d[T](t,t). That is to say, in destructor calls, we only allow variables from *within* the constructor pattern of the left-hand side. This criterion already guarantees termination (and hence also productivity [2]) in our system, i.e. the finiteness of all reduction sequences, which can be shown with the usual argument of a property that strictly decreases under reduction. A reduction step in GADT <sup>T</sup> with right-hand sides restricted like that strictly decreases, under lexicographic order, the pair of


This strict decrease can be proved by induction on the derivation of the reduction step. Since there are no infinitely decreasing sequences of these pairs, any reduction sequence must be finite. Note that our criterion in itself excludes far too many programs to be anywhere near practical, but it is readily conceivable how to relax it to only *recursive* calls together with a check that excludes mutual recursion.<sup>6</sup>

Let's look at Fig. 7 once more to illustrate that this criterion corresponds to both structural recursion and guarded corecursion. In the right-hand side of Fig. <sup>7</sup> we see that the first argument to the recursive call in the last line is n, which is allowed by our restriction because it is a syntactic part of the original input,

<sup>6</sup> For instance one might request the programmer to order the destructor names such that in equations for a certain destructor only destructors of lower order may be called.

succ(n) (structural recursion). The call to apply is not a problem because it is not a recursive call.<sup>7</sup> At the same time, if we look at the last line in the left-hand side of Fig. 7, we see that the criterion also corresponds to guarded corecursion. With copatterns, guarded corecursion means that we do not destruct the result of a recursive call (the "guard" itself is implicit in the pattern on the left-hand side of the equation). However, destructing that result would mean that we would have to call a destructor with the recursive call as its first argument, which is again forbidden by the syntactic criterion.

### **5.4 Going Beyond System F-like Polymorphism**

A particularly interesting direction for future work is to extend GADT <sup>T</sup> and go beyond the System F-like polymorphism. For instance, F<sup>ω</sup> contains a copy of the simply-typed lambda calculus on the type level. Could one also generalize type-level functions to arbitrary codata and maybe use a variant of GADT <sup>T</sup> on the type level? Can dependent products like in the calculus of constructions [13] be generalized in a similar way? Can inductive types like in the calculus of inductive constructions be formulated such that there is a dual that is also related by de/refunctionalization? Thibodeau et al. [36] have formulated such a dual, but whether it can be massaged to fit into the setting described here is not obvious.

### **5.5 Coq Formalization**

Our Coq formalization is quite close to the traditional presentation chosen for this paper, but there are some technical differences. Both term and type variables are encoded via de Bruijn indices, which is rather standard for programming language mechanization. More interestingly, the syntax of the language in the Coq formalization expresses some of the constraints we express here via typing rules instead via dependent types. Specifically, terms and types are indexed by the type variables that can appear inside. To represent matrices, we have developed a small library of dependently typed tables (where the cell types can depend on the row and column labels), such that the matrix type already guarantees that all type variables that show up in terms and types are bound. An earlier version of the formalization and the soundness proof used explicit well-formedness constraints to guarantee that all type variables are bound; the type soundness proof for this version was about twice as long as the one using dependent types. On the flip side, we had to "pay" for using the dependent types in the form of many annoying "type casts" in definitions and theorems owing to the fact that Coq's equality is intensional and not extensional [9, Sect. 10.3]. Finally, instead of using an evaluation context to define evaluation order like we did in Fig. 4, we have used traditional congruence rules. In the reduction relation as formalized in Coq, a single step can actually correspond to multiple steps in the formalization presented in the paper; however, this is just a minor technicality to slightly simplify the proofs.

<sup>7</sup> As long as we avoid mutual recursion, for instance by ensuring fold *>* apply.

### **6 Related Work**

"Theoreticians appreciate duality because it reveals deep symmetries. Practitioners appreciate duality because it offers two-for-the-price-of-one economy." This quote from Wadler [38] describes the spirit behind the design of GADT <sup>T</sup> , but of course this is not the first paper to talk about duality in programming languages. We have already discussed the most closely related works in previous sections; here, we compare GADT <sup>T</sup> with theoretical calculi with related duality properties and point out an aspect of practical programming for which the duality of GADT <sup>T</sup> is relevant.

*Codata.* Hagino [23] pioneered the idea of dualizing data types: Whereas data types are used to define a type by the ways to *construct* it, codatatypes are dual to them in the sense that they are specified by their *deconstructions*. Abel et al. [1] introduce copatterns which allow functions producing codata to be defined by matching on the destructors of the result codatatype, dually to matching on the constructors of the argument datatype. All these developments occur in a world where function types are a given. The symmetric codata and data language fragments proposed by Rendel et al. [31] deviate from this: By enhancing destructor signatures with argument types, they provide a form of codata that is a generalization of first-class functions. Both the works by Rendel et al. [31] and Abel et al. [1] are simply-typed.

The (co)datatypes in the calculus of ownen and Ariola [19] also allow for user-defined function types. Their focus is different from ours, though, as they are mostly interested in evaluation strategies and their duality, and with regards to their calculus itself they work in an untyped setting. What is interesting in comparison with GADT <sup>T</sup> is how their (co)datatype declarations and signatures are inherently more symmetric as they essentially describe a type system for the parametric sequent calculus. As such, the position of additional arguments in the destructor signatures has a mirror counterparts in constructor signatures (to highlight this, Downen and Ariola [19] refer to destructors as "co-constructors").

*Duality of Computations and Values.* Staying on with the idea of avoiding function types as primitives for a moment, Wadler [38] presents a "dual calculus" in which the previously astonishing result that call-by-name is De Morgan-dual to call-by-value [14] is clarified by defining implication (corresponding to function types via the Curry-Howard isomorphism) in two different ways dependent on the intended corresponding evaluation regime. A somewhat similar approach, but perhaps more directly related to the data/codata duality, that also deals with the "troubling" coexistence of call-by-value and call-by-name, was proposed by Levy [27]. Levy [27] presents a calculus with a new evaluation regime, *call-bypush value* (CBPV), which subsumes call-by-value and call-by-name by encoding the local choice for either in the terms of the calculus. More specifically, there are two kinds of terms in the CBPV calculus: computations and values, which can be inter-converted by "thunking" and "forcing". The terms for computations and values are said to be of *positive* type and of *negative* type, respectively. Thibodeau et al. [36] have built their calculus, which extends codatatypes to indexed codatatypes, on top of CBPV, with datatypes being positive and codatatypes being negative. We think that, when extending GADT <sup>T</sup> with local (co)pattern matching on the term level, perhaps with pattern and copattern matching terms mixed, it might be helpful to similarly recast the resulting language as a modification of the CBPV calculus of Levy [27].

### **7 Conclusions**

We have presented a formal calculus, GADT <sup>T</sup> , which uniformly describes both GADTs and their dual, GAcoDTs. GADTs and GAcoDTs can be converted back and forth by defunctionalization and refunctionalization, both of which correspond to a transposition of the matrix of the equations for each pair of constructor/destructor. We have formalized the calculus in Coq and mechanically verified its type soundness, its extensibility properties, and the preservation of typing and operational semantics by defunctionalization and refunctionalization.

We believe that our work can be of help for future language design since it describes a methodology to get a kind of "sweet spot" where data and codata constructs (including functions) are "in sync". We think that it can also be useful as a general program transformation tool, both on the program level as a kind of refactoring tool, but also as part of compilers and runtime systems. Finally, since codata is quite related to objects in object-oriented programming, we hope that our approach can help to clarify their relation and design languages which subsume both traditional functional and object-oriented languages.

**Acknowledgments.** We would like to thank Tillmann Rendel and Julia Trieflinger for providing some early ideas for the design of what eventually became *GADT* <sup>T</sup> . This work was supported by DFG project OS 293/3-1.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Deterministic Concurrency: A Clock-Synchronised Shared Memory Approach**

Joaqu´ın Aguado1(B), Michael Mendler<sup>1</sup>, Marc Pouzet<sup>2</sup>, Partha Roop<sup>3</sup>, and Reinhard von Hanxleden<sup>4</sup>

> <sup>1</sup> Otto-Friedrich-Universit¨at Bamberg, Bamberg, Germany joaquin.aguado@uni-bamberg.de


**Abstract.** Synchronous Programming (SP) is a universal computational principle that provides deterministic concurrency. The same input sequence with the same timing always results in the same externally observable output sequence, even if the internal behaviour generates uncertainty in the scheduling of concurrent memory accesses. Consequently, SP languages have always been strongly founded on mathematical semantics that support formal program analysis. So far, however, communication has been constrained to a set of primitive clocksynchronised shared memory (csm) data types, such as data-flow registers, streams and signals with restricted read and write accesses that limit modularity and behavioural abstractions.

This paper proposes an extension to the SP theory which retains the advantages of deterministic concurrency, but allows communication to occur at higher levels of abstraction than currently supported by SP data types. Our approach is as follows. To avoid data races, each csm type publishes a *policy interface* for specifying the admissibility and precedence of its access methods. Each instance of the csm type has to be policy-coherent, meaning it must behave deterministically under its own policy—a natural requirement if the goal is to build deterministic systems that use these types. In a policy-constructive system, all access methods can be scheduled in a policy-conformant way for all the types without deadlocking. In this paper, we show that a policy-constructive program exhibits deterministic concurrency in the sense that all policyconformant interleavings produce the same input-output behaviour. Policies are conservative and support the csm types existing in current SP languages. Technically, we introduce a kernel SP language that uses arbitrary policy-driven csm types. A big-step fixed-point semantics for this language is developed for which we prove determinism and termination of constructive programs.

**Keywords:** Synchronous programming · Data abstraction Clock-synchronised shared memory · Determinacy · Concurrency Constructive semantics

### **1 Introduction**

Concurrent programming is challenging. Arbitrary interleavings of concurrent threads lead to non-determinism with data races imposing significant integrity and consistency issues [1]. Moreover, in many application domains such as safetycritical systems, *determinism* is indeed a matter of life and death. In a medicaldevice software, for instance, the same input sequence from the sensors (with the same timing) must always result in the same output sequence for the actuators, even if the run-time software architecture regime is unpredictable.

Synchronous programming (SP) delivers *deterministic concurrency* out of the box<sup>1</sup> which explains its success in the design, implementation and validation of embedded, reactive and safety-critical systems for avionics, automotive, energy and nuclear industries. Right now SP-generated code is flying on the Airbus 380 in systems like flight control, cockpit display, flight warning, and anti-icing just to mention a few. The SP mathematical theory has been fundamental for implementing correct-by-construction program-derivation algorithms and establishing formal analysis, verification and testing techniques [2]. For SCADE<sup>2</sup>, the SP industrial modelling language and software development toolkit, the formal SP background has been a key aspect for its certification at the highest level A of the aerospace standard DO-178B/C. This SP rigour has also been important for obtaining certifications in railway and transportation (EN 50128), industry and energy (IEC 61508), automotive (TUV and ISO 26262) as well as for ensuring ¨ full compliance with the safety standards of nuclear instrumentation and control (IEC 60880) and medical systems (IEC 62304) [3].

*Synchronous Programming in a Nutshell.* At the top level, we can imagine an SP system as a black-box with inputs and outputs for interacting with its environment. There is a special input, called the *clock*, that determines when the communication between system and environment can occur. The clock gets an input stimulus from the environment at discrete times. At those moments we say that the clock *ticks*. When there is no tick, there is no possible communication, as if system and environment were disconnected. At every tick, the system *reacts* by reading the current inputs and executing a *step function* that delivers outputs and changes the internal memory. For its part, the environment must synchronise with this reaction and do not go ahead with more ticks. Thus,

<sup>1</sup> Milner's distinction between *determinacy* and *determinism* is that a computation is *determinate* if the same input sequence produces the same output sequence, as opposed to *deterministic* computations which in addition have identical internal behaviour/scheduling. In this paper we use both terms synonymously to mean determinacy in Milner's sense, i.e., observable determinism.

<sup>2</sup> SCADE is a product of ANSYS Inc. (http://www.esterel-technologies.com/).

in SP, we assume (*Synchrony Hypothesis*) that the time interval of a system reaction, also called *macro-step* or (*synchronous*) *instant*, appears instantaneous (has zero-delay) to the environment. Since each system reaction takes exactly one clock tick, we describe the evolution of the system-environment interaction as a synchronous (lock-step) sequence of macro-steps. The SP theory guarantees that all externally observable interaction sequences derived from the macro-step reactions define a functional input-output relation.

The fact that the sequences of macro-steps take place in time and space (memory) has motivated two orthogonal developments of SP. The *data-flow* view regards input-output sequences as synchronous streams of data changing over time and studies the functional relationships between streams. Dually, the *control-flow* approach projects the information of the input-output sequences at each point in time and studies the changes of this global state as time progresses, i.e., from one tick to the next. The SP paradigm includes languages such as Esterel [4], Quartz [5] and SC [6] in the imperative control-flow style and languages like Signal [7], Lustre [8] and Lucid Synchrone [9] that support the declarative data-flow view. There are even mixed control-data flow language such as Esterel V7 [10] or SCADE [3]. Independently of the execution model, the common strength to all of these SP languages is a precise formal semantics—an indispensable feature when dealing with the complexities of concurrency.

At a more concrete level, we can visualise an SP system as a white-box where inside we find (graphical or textual) code. In the SP domain, the program must be divided into fragments corresponding to the macro-step reactions that will be executed instantaneously at each tick. Declarative languages usually organise these macro-steps by means of (internally generated) activation clocks that prescribe the blocks (nodes) that are performed at each tick. Instead, imperative textual languages provide a pause statement for explicitly delimiting code execution within a synchronous instant. In either case, the Synchrony Hypothesis conveniently abstracts away all the, typically concurrent, low-level *micro-steps* needed to produce a system reaction. The SP theory explains how the micro-step accesses to shared memory must be controlled so as to ensure that all internal (white-box) behaviour eventually stabilises, completing a deterministic macrostep (black-box) response. For more details on SP, the reader is referred to [2].

*State of the Art.* Traditional imperative SP languages provide constructs to model control-dominated systems. Typically, these include a concurrent composition of *threads* (sequential processes) that guarantees determinism and offers *signals* as the main means for data communication between threads. Signals behave like shared variables for which the concurrent accesses occurring within a macro-step are scheduled according to the following principles: A *pure signal* has a *status* that can be *present* (1) or *absent* (0). At the beginning of each macro-step, pure signals have status 0 by default. In any instant, a signal <sup>s</sup> can be explicitly *emitted* with the statement s. emit() which atomically sets its status to 1. We can read the status of s with the statement s. pres(), so the control-flow can branch depending on run-time signal statuses. Specifically, inside programs, if-then-else constructions await for the appropriate combination of present and absent signal statuses to emit (or not) more signals. The main issue is to avoid inconsistencies due to circular causality resulting from decisions based on absent statuses. Thus, the order in which the access methods emit, pres are scheduled matters for the final result. The usual SP rule for ensuring determinism is that the pres test must wait until the final signal status is decided. If all signal accesses can be scheduled in this *decide-then-read* way then the program is *constructive*. All schedules that keep the decide-then-read order will produce the same input-output result. This is how SP reconciles concurrency and observable determinism and generates much of its algebraic appeal. Constructiveness of programs is what static techniques like the *must-can* analysis [4,11–13] verify although in a more abstract manner. Pure signals are a simple form of *clock-synchronised shared memory* (csm) data types with access methods (operations) specific to this csm type. Existing SP control-flow languages also support other restricted csm types such valued signals and arrays [10] or sequentially constructive variables [6].

*Contribution.* This paper proposes an extension to the SP model which retains the advantages of deterministic concurrency while widening the notion of constructiveness to cover more general csm types. This allows shared-memory communication to occur at higher levels of abstraction than currently supported. In particular, our approach subsumes both the notions of *Berry-constructiveness* [4] for Esterel and *sequential constructiveness* for SCL [14]. This is the first time that these SP communication principles are combined side-by-side in a single language. Moreover, our theory permits other predefined communication structures to coexist safely under the same uniform framework, such as data-flow variables [8], registers [15], Kahn channels [16], priority queues, arrays as well as other csm types currently unsupported in SP.

*Synopsis and Overview.* The core of our approach is presented in Sect. 2 where *policies* are introduced as a (constructive) synchronisation mechanism for arbitrary *abstract data types* (ADT). For instance, the policy of a pure signal is depicted in Fig. 1. It has two control *states* 0 and 1 corresponding to the two possible signal statuses. Transitions are decorated with method names pres, emit or with <sup>σ</sup> to indicate a clock tick.

The policy tells us whether a given method or tick is *admissible*, i.e., if it can be scheduled from a particular state<sup>3</sup>. In addition, transitions include a *blocking* set of method names as part of their *action* labels. This set determines a *precedence* between methods from a given state. A label m : L specifies that all methods in L take precedence over m.

**Fig. 1.** Pure signal policy.

An empty blocking set ∅ indicates no precedences. To improve visualisation, we

<sup>3</sup> The signal policy in Fig. 1 does not impose any admissibility restriction since methods pres and emit can be scheduled from every policy state.

highlight precedences by dotted (red) arrows tagged prec<sup>4</sup>. The *policy interface* in Fig. 1 specifies the decide-then-read protocol of pure signals as follows. At any instant, if the signal status is 0 then the pres test can only be scheduled if there are no more potential emit statements that can still update the status to 1. This explains the precedence of the emit transition over the self loop with action label pres : {emit} from state 0. Conversely, transitions pres and emit from state 1 have no precedences, meaning that the pres and emit methods are *confluent* so they can be freely scheduled (interleaved). The reason is that a signal status 1 is already decided and can no longer be changed by either method in the same instant. In general, any two admissible methods that do not block each other must be confluent in the sense that the same policy state is reached independently of their order of execution. Note that all the σ transition go to the *initial* state 0 since at each tick the SP system enters a new macro-step where all pure signals get initialised to the 0 status.

Section 2 describes in detail the idea of a scheduling policy on general csm types. This leads to a type-level *coherence* property, which is a local form of determinism. Specifically, a csm type is *policy-coherent* if it satisfies the (policy) specification of admissibility and precedence of its access methods. The point is that a policy-coherent csm type per se behaves deterministically under its own policy—a very natural requirement if the goal is to build deterministic systems that use this type. For instance, the fact that Esterel signals are deterministic (policy-coherent) in the first place permits techniques such as the must-can analysis to get constructive information about deterministic programs. We show how policy-coherence implies a global determinacy property called *commutation*. Now, in a *policy-constructive* program all access methods can be scheduled in a *policy-conforming* way for all the csm types without deadlocking. We also show that, for policy-coherent types, a policy-constructive program exhibits deterministic concurrency in the sense that all policy-conforming interleavings produce the same input-output behaviour.

To implement a constructive scheduling mechanism parameterised in arbitrary csm type policies, we present the synchronous kernel language, called *Deterministic Concurrent Language* (DCoL), in Sect. 2.1. DCoL is both a minimal language to study the new mathematical concepts but can also act as an intermediate language for compiling existing SP Sect. 3 presents its policy-driven operational semantics for which determinacy and termination is proven. Section 3 also explains how this model generalises existing notions of constructiveness. We discuss related work in Sect. 4 and present our conclusions in Sect. 5.

A companion of this paper is the research report (https://www.uni-bamberg. de/fileadmin/uni/fakultaeten/wiai professuren/grundlagen informatik/papers MM/report-WIAI-102-Feb-2018.pdf) [17] which contains detailed proofs and additional examples of csm types.

<sup>4</sup> We tacitly assume that the tick transitions σ have the lowest *priority* since only when the reaction is over, the clock may tick. We could be more explicit and write σ : {pres, emit} as action labels for these transitions.

### **2 Synchronous Policies**

This section introduces a kernel synchronous *Deterministic Concurrent Language* (DCoL) for policy-conformant constructive scheduling which integrates policy-controlled csm types within a simple syntax. DCoL is used to discuss the behavioural (clock) abstraction limitations of current SP. Then policies are introduced as a mechanism for specifying the scheduling discipline for csm types which, in this form, can encapsulate arbitrary ADTs.

### **2.1 Syntax**

The syntax of DCoL is given by the following operators:


The first two statements correspond to the two forms of immediate *completion*: skip terminates instantaneously and pause waits for the logical clock to terminate. The operators <sup>P</sup> || <sup>Q</sup> and <sup>P</sup> ; <sup>Q</sup> are *parallel interleaving* and *imperative sequential* composition of threads with the standard operational interpretation. Reading and destructive updating is performed through the execution of method calls <sup>c</sup>.m(e) on a csm *variable* <sup>c</sup> <sup>∈</sup> <sup>O</sup> with a method <sup>m</sup> <sup>∈</sup> <sup>M</sup>c. The sets <sup>O</sup> and <sup>M</sup>c define the granularity of the available memory accesses. The construct let <sup>x</sup> = c.m(e) in <sup>P</sup> calls <sup>m</sup> on c with an input parameter determined by *value expression* e. It binds the return value to variable x and then executes program <sup>P</sup>, which may depend on <sup>x</sup>, sequentially afterwards. The execution of c.m(e) in general has the side-effect of changing the internal memory of c. In contrast, the evaluation of expression <sup>e</sup> is side-effect free. For convenience we write <sup>x</sup> = c.m(e);<sup>P</sup> for let <sup>x</sup> = c.m(e) in <sup>P</sup>. When <sup>P</sup> does not depend on <sup>x</sup> then we write c.m(e);<sup>P</sup> and c.m(e); for c.m(e); skip. The exact syntax of value expressions <sup>e</sup> is irrelevant for this work and left open. It could be as simple as permitting only constant value literals or a full-fledged functional language. The *conditional* if <sup>e</sup> then <sup>P</sup> else <sup>P</sup> has the usual interpretation. For simplicity, we may write if c.m(e) then <sup>P</sup> else <sup>Q</sup> to mean <sup>x</sup> = c.m(e); if <sup>x</sup> then <sup>P</sup> else <sup>Q</sup>. The *recursive closure* rec p. P binds the behaviour P to the program label p so it can be called from within P. Using this construct we can build iterative behaviours. For instance, loop <sup>P</sup> end <sup>=</sup>*df* rec p. P; pause ;<sup>p</sup> indefinitely repeats <sup>P</sup> in each tick. We assume that in a closure rec p. P the label p is (i) *clock guarded*, i.e., it occurs in the scope of at least one pause (meaning no instantaneous loops) and (ii) all occurrences of <sup>p</sup> are in the same thread. Thus, rec p. p is illegal because of (i) and rec p.(pause ; <sup>p</sup> || pause ; <sup>p</sup>) is not permitted because of (ii).

This syntax seems minimalistic compared to existing SP languages. For instance, it does not provide primitives for pre-emption, suspension or traps as in Quartz or Esterel. Recent work [18] has shown how these control primitives can be translated into the constructs of the SCL language, exploiting destructive update of sequentially constructive (SC) variables. Since SC variables are a special case of policy-controlled csm variables, DCoL is at least as expressive as SCL.

### **2.2 Limited Abstraction in SP**

The pertinent feature of standard SP languages is that they do not permit the programmer to express sequential execution order inside a tick, for destructive updates of signals. All such updates are considered concurrent and thus must either be combined or concern distinct signals. For instance, in languages such as Esterel V7 or Quartz, a parallel composition

$$(v = \texttt{xs.read}() \; ; \; \texttt{ys.emit}(v+1)) \; \parallel \; (\texttt{xs.emit}(1) \; ; \; \texttt{xs.emit}(5)) \tag{1}$$

of signal emissions is only constructive if a commutative and associative function is defined on the shared signal xs to combine the values assigned to it. But then, by the properties of this *combination function*, we get the same behaviour if we swap the assignments of values 1 and 5, or execute all in parallel as in

<sup>v</sup> = xs.read() || ys. emit(<sup>v</sup> + 1) || xs. emit(1) || xs. emit(5).

If what we intended with the second emission xs. emit(5) in (1) was to override the first xs. emit(1) like in normal imperative programming so that the concurrent thread <sup>v</sup> = xs.read() ; ys. emit(<sup>v</sup> + 1) will read the updated value as <sup>v</sup> = 5? Then we need to introduce a pause statement to separate the emissions by a clock tick and delay the assignment to ys as in

```
(pause ; v = xs.read() ; ys. emit(v + 1)) || (xs. emit(1) ; pause ; xs. emit(5)).
```
This makes behavioural abstraction difficult. For instance, suppose nats is a synchronous reaction module, possibly composite and with its own internal clocking, which returns the stream of natural numbers. Every time its step function nats.step() is called it returns the next number and increments its internal state. If we want to pair up two successive numbers within one tick of an outer clock and emit them in a single signal ys we would write something like <sup>x</sup><sup>1</sup> <sup>=</sup> nats.step() ; <sup>x</sup><sup>2</sup> <sup>=</sup> nats.step() ; y. emit(x1, x2) where <sup>x</sup>1, <sup>x</sup><sup>2</sup> are threadlocal value variables. This over-clocking is impossible in traditional SP because there is no imperative sequential composition by virtue of which we can call the step function of the same module instance twice within a tick. Instead, the two calls nats.step() are considered concurrent and thus create non-determinacy in the value of y. <sup>5</sup> To avoid a compiler error we must separate the calls by a clock as

<sup>5</sup> In Esterel V7 it is possible to use a module twice in a "sequential" composition x<sup>1</sup> = nats.step(); x<sup>2</sup> = nats.step(). However, the two occurrences of nats are distinct instances with their own internal state. Both calls will thus return the same value.

in <sup>x</sup><sup>1</sup> <sup>=</sup> nats.step() ; pause ; <sup>x</sup><sup>2</sup> <sup>=</sup> nats.step() ; y. emit(x1, x2) which breaks the intended clock abstraction.

The data abstraction limitation of traditional SP is that it is not directly possible to encapsulate a composite behaviour on synchronised signals as a shared synchronised object. For this, the simple decide-then-read signal protocol must be generalised, in particular, to distinguish between concurrent and sequential accesses to the shared data structure. A concurrent access <sup>x</sup><sup>1</sup> = nats.step() || <sup>x</sup><sup>2</sup> = nats.step() must give the same value for <sup>x</sup><sup>1</sup> and <sup>x</sup>2, while a sequential access <sup>x</sup><sup>1</sup> = nats.step() ; <sup>x</sup><sup>2</sup> = nats.step() must yield successive values of the stream. In a sequence <sup>x</sup> = xs.read() ; xs. emit(v) the <sup>x</sup> does not see the value <sup>v</sup> but in a parallel <sup>x</sup> = xs.read() || xs. emit(v) we may want the read to wait for the emission. The rest of this section covers our theory on policies in which this is possible. The modularity issue is reconsidered in Sect. 2.6.

### **2.3 Concurrent Access Policies**

In the white-box view of SP, an imperative program consists of a set of threads (sequential processes) and some csm variables for communication. Due to concurrency, a given *thread under control* (tuc) has the chance to access the shared variables only from time to time. For a given csm variable, a *concurrent access policy* (cap) is the locking mechanism used to control the accesses of the current tuc and its environment. The locking is to ensure that determinacy of the csm type is not broken by the concurrent accesses. A cap is like a policy which has extra transitions to model potential environment accesses outside the tuc. Concretely, a cap is given by a state machine where each transition label a : L codifies an *action* a taking place on the shared variable with *blocking set* L, where L is a set of methods that take precedence over a. The action is either a *method* m : L, a *silent action* τ : L or a *clock tick* σ : L. A transition m : L expresses that in the current cap control state, the method m can be called by the tuc, provided that no method in L is called concurrently. There is a *Determinacy Requirement* that guarantees that each method call by the tuc has a blocking set and successor state. Additionally, the execution of methods by the cap must be *confluent* in the sense that if two methods are admissible and do not block each other, then the cap reaches the same policy state no matter the order in which they are executed. This is to preserve determinism for concurrent variable accesses. A transition τ : L internalises method calls by the tuc's concurrent environment which are uncontrollable for the tuc. In the sequel, the actions in <sup>M</sup>c ∪ {σ} will be called *observable*. A transition <sup>σ</sup> : <sup>L</sup> models a clock synchronisation step of the tuc. Like method calls, such clock ticks must be determinate as stated by the Determinacy Requirement. Additionally, the clock must always wait for any predicted concurrent τ -activity to complete. This is the *Maximal Progress Requirement*. Note that we do not need confluence for clock transitions since they are not concurrent.

**Definition 1.** *A* concurrent access policy *(* cap*) <sup>c</sup> of a* csm *variable c with (access) methods M<sup>c</sup> is a state machine consisting of a set of* control states P*c,* *an* initial state <sup>ε</sup> <sup>∈</sup> <sup>P</sup>*<sup>c</sup> and a labelled* transition relation → ⊆ <sup>P</sup>*<sup>c</sup>* <sup>×</sup> *<sup>A</sup><sup>c</sup>* <sup>×</sup> <sup>P</sup>*<sup>c</sup> with* action *labels <sup>A</sup><sup>c</sup>* = (*M<sup>c</sup>* ∪ {τ,σ}) <sup>×</sup> <sup>2</sup>*M<sup>c</sup> . Instead of* (μ1,(a, L), μ2)∈ → *we write* <sup>μ</sup><sup>1</sup> <sup>−</sup>a:L<sup>→</sup> <sup>μ</sup>2*. We then say action* <sup>a</sup> *is* admissible *in state* <sup>μ</sup><sup>1</sup> *and* blocked *by all methods* <sup>m</sup> <sup>∈</sup> <sup>L</sup> <sup>⊆</sup> *<sup>M</sup>c. When the* blocking set <sup>L</sup> *is irrelevant we drop it and write* <sup>μ</sup><sup>1</sup> <sup>−</sup>a<sup>→</sup> <sup>μ</sup>2*. A* cap *must satisfy the following conditions:*


*<sup>A</sup>* policy *is a* cap *without any (concurrent)* <sup>τ</sup> *activity, i.e., every* <sup>μ</sup> <sup>−</sup>a<sup>→</sup> <sup>μ</sup> *implies that* <sup>a</sup> *is observable.* 

The use of a cap as a concurrent policy arises from the notion of enabling. Informally, an observable action <sup>a</sup> <sup>∈</sup> <sup>M</sup>c ∪ {σ} is enabled in a state <sup>μ</sup> of a cap if it is admissible in μ *and* in all subsequent states reachable under arbitrary silent steps. To formalise this we define *weak transitions* <sup>μ</sup><sup>1</sup> <sup>⇒</sup> <sup>μ</sup><sup>2</sup> inductively to express that either <sup>μ</sup><sup>1</sup> <sup>=</sup> <sup>μ</sup><sup>2</sup> or <sup>μ</sup><sup>1</sup> <sup>⇒</sup> <sup>μ</sup> and <sup>μ</sup> <sup>−</sup>τ<sup>→</sup> <sup>μ</sup>2. We exploit the determinacy for observable actions <sup>a</sup> <sup>∈</sup> <sup>M</sup>c ∪ {σ} and write <sup>μ</sup> <sup>a</sup> for the unique <sup>μ</sup> such that <sup>μ</sup> <sup>−</sup>a<sup>→</sup> <sup>μ</sup> , if it exists.

**Definition 2.** *Given a* cap *<sup>c</sup>*= (P*c*, ε, −→)*, an observable action* <sup>a</sup> <sup>∈</sup> *<sup>M</sup><sup>c</sup>* ∪ {σ} *is* enabled *in state* <sup>μ</sup> <sup>∈</sup> <sup>P</sup>*c, written* <sup>μ</sup> *<sup>c</sup>* <sup>↓</sup> <sup>a</sup>*, if* <sup>μ</sup> <sup>a</sup> *exists for all* <sup>μ</sup> *such that* <sup>μ</sup> <sup>⇒</sup> <sup>μ</sup> *. A sequence <sup>a</sup>* <sup>∈</sup> (*M<sup>c</sup>* ∪ {σ})<sup>∗</sup> *of observable actions is* enabled *in* <sup>μ</sup> <sup>∈</sup> <sup>P</sup>*c, written* μ *<sup>c</sup>* <sup>↓</sup> *<sup>a</sup>, if (i) <sup>a</sup>* <sup>=</sup> <sup>ε</sup> *or (ii) <sup>a</sup>* <sup>=</sup> <sup>a</sup> *<sup>b</sup>,* <sup>μ</sup> *<sup>c</sup>* <sup>↓</sup> <sup>a</sup> *and* <sup>μ</sup> <sup>a</sup> *<sup>c</sup>* ↓ *b.* 

*Example 1.* Consider the policy s in Fig. <sup>1</sup> of an Esterel pure signal s. An edge labelled <sup>a</sup>:<sup>L</sup> from state <sup>μ</sup><sup>1</sup> to <sup>μ</sup><sup>2</sup> corresponds to a transition <sup>μ</sup><sup>1</sup> <sup>−</sup>a:L<sup>→</sup> <sup>μ</sup><sup>2</sup> in s. The start state is <sup>ε</sup> = 0 and the methods <sup>M</sup>s <sup>=</sup> {pres, emit} are always admissible, i.e., <sup>μ</sup> <sup>m</sup> is defined in each state μ for all methods m. The presence test does not change the state and any emission sets it to 1, i.e., <sup>μ</sup> pres <sup>=</sup> <sup>μ</sup> and <sup>μ</sup> emit = 1 for

**Fig. 2.** Synchronous IVar.

all <sup>μ</sup> <sup>∈</sup> <sup>P</sup>s. Each signal status is reset to 0 with the clock tick, i.e., <sup>μ</sup> <sup>σ</sup> = 0. Clearly, s satisfies Determinacy. A presence test on a signal that is not emitted yet has to wait for all pending concurrent emissions, that is emit blocks pres in state 0, i.e., 0 <sup>−</sup>pres :{emit}→ 0. Otherwise, no transition is blocked. Also, all competing transitions <sup>μ</sup> <sup>−</sup>m1:L1<sup>→</sup> <sup>μ</sup><sup>1</sup> and <sup>μ</sup> <sup>−</sup>m2:L2<sup>→</sup> <sup>μ</sup><sup>2</sup> that do not block each other, are of the form μ<sup>1</sup> = μ2, from which Confluence follows. Note that since there are no silent transitions, Maximal Progress is always fulfilled too. Moreover, an action sequence is enabled in a state μ (Definition 2) iff it corresponds to a path in the automaton starting from <sup>μ</sup>. Hence, for *<sup>m</sup>* <sup>∈</sup> <sup>M</sup><sup>∗</sup> s we have 0 s <sup>↓</sup> *<sup>m</sup>* iff *<sup>m</sup>* is in the regular language<sup>6</sup> pres<sup>∗</sup> <sup>+</sup> pres<sup>∗</sup> emit(pres <sup>+</sup> emit)<sup>∗</sup> and 1 s <sup>↓</sup> *<sup>m</sup>* for all *<sup>m</sup>* <sup>∈</sup> <sup>M</sup><sup>∗</sup> s.

Contrast s with the policy c of a synchronous *immutable variable* (IVar) <sup>c</sup> shown in Fig. <sup>2</sup> with methods <sup>M</sup>c <sup>=</sup> {get, put}. During each instant an IVar can be written (put) at most once and cannot be read (get) until it has been written. No value is stored between ticks, which means the memory is only temporary and can be reused, e. g., IVars can be implemented by wires. Formally, μ c <sup>↓</sup> put iff μ = 0, where 0 is the initial empty state and μ c <sup>↓</sup> get iff <sup>μ</sup> = 1, where 1 is the filled state. The transition 0 <sup>−</sup>put:{put}→ 1 switches to filled state where get is admissible but put is not, anymore. The blocking {put} means there cannot be other concurrent threads writing c at the same time. 

### **2.4 Enabling and Policy Conformance**

A policy describes what a single thread can do to a csm variable c when it operates alone. In a cap all potential activities of the environment are added as τ transitions to block the tuc's accesses. To implement this τ -locking we define an operation that generates a cap [μ, γ] out of a policy. In this construction, <sup>μ</sup> <sup>∈</sup> <sup>P</sup>c is a policy state recording the history of methods that have been performed on c so far (*must* information). The second component <sup>γ</sup> <sup>⊆</sup> <sup>M</sup><sup>∗</sup> c is a prediction for the sequences of methods that can still potentially be executed by the concurrent environment (*can* information).

**Definition 3.** *Let* (P*c*, ε,→) *be a policy. We define a* cap *<sup>c</sup> where states are pairs* [μ, γ] *such that* <sup>μ</sup> <sup>∈</sup> <sup>P</sup>*<sup>c</sup> is a policy state and* <sup>γ</sup> <sup>⊆</sup> *<sup>M</sup>*<sup>∗</sup> *<sup>c</sup> is a prediction. The initial state is* [ε, *M*<sup>∗</sup> *<sup>c</sup>*] *and the transitions are as follows:*


Silent steps arise from the concurrent environment: A step [μ1, γ1] <sup>−</sup><sup>τ</sup> :L<sup>→</sup> [μ2, γ2] removes some prefix method m from the environment prediction γ1, which contracts to an updated suffix prediction <sup>γ</sup><sup>2</sup> with m γ<sup>2</sup> <sup>⊆</sup> <sup>γ</sup>1. This method <sup>m</sup> is executed on the csm variable, changing the policy state to <sup>μ</sup><sup>2</sup> <sup>=</sup> <sup>μ</sup><sup>1</sup> <sup>m</sup>. A method m is enabled, [μ, γ] c <sup>↓</sup> <sup>m</sup>, if for all [μ1, γ1] which are <sup>τ</sup> -reachable from [μ, γ], method <sup>m</sup> is admissible, i.e., [μ1, γ1] <sup>−</sup>m<sup>→</sup> [μ2, γ1] for some <sup>μ</sup>2.

*Example 2.* Consider concurrent threads <sup>P</sup><sup>1</sup> || <sup>P</sup>2, where <sup>P</sup><sup>2</sup> <sup>=</sup> zs.put(5) ; <sup>u</sup> <sup>=</sup> ys.get() and <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>v</sup> = zs.get() ; ys.put(<sup>v</sup> + 1) with IVars zs, ys according to

<sup>6</sup> We are more liberal than Esterel where emit cannot be called sequentially after pres.

Example 1. Under the IVar policy the execution is deterministic, so that first <sup>P</sup><sup>2</sup> writes on zs, then <sup>P</sup><sup>1</sup> reads from zs and writes to ys, whereupon finally <sup>P</sup><sup>1</sup> reads ys. Suppose the variables have reached policy states <sup>μ</sup>zs and <sup>μ</sup>ys and the threads are ready to execute the residual programs P *<sup>i</sup>* waiting at some method call c*i*.m*i*(v*i*), respectively. Since thread <sup>P</sup> *<sup>i</sup>* is concurrent with the other <sup>P</sup> <sup>3</sup>−*i*, it can only proceed if m*<sup>i</sup>* is not blocked by P <sup>3</sup>−*i*, i.e., if [μc*<sup>i</sup>* , *can*c*<sup>i</sup>* (P <sup>3</sup>−*i*)] <sup>c</sup>*<sup>i</sup>* <sup>↓</sup> <sup>m</sup>*i*, where *can*c(P) <sup>⊆</sup> <sup>M</sup><sup>∗</sup> c is the set of method sequences predicted for <sup>P</sup> on <sup>c</sup>.

Initially we have <sup>μ</sup>zs =0= <sup>μ</sup>ys. Since method get is not admissible in state 0, we get [0, *can*zs(P2)] zs <sup>↓</sup> get by Definitions <sup>3</sup> and 2. So, <sup>P</sup><sup>1</sup> is blocked. The zs.put of <sup>P</sup>2, however, can proceed. First, since no predicted method sequence *can*zs(P1) = {get} of <sup>P</sup><sup>1</sup> starts with put, the transition 0 <sup>−</sup>put:{put}→ <sup>1</sup> implies that [0, *can*zs(P1)] <sup>−</sup>put:{put}→ [1, *can*zs(P1)] by Definition 3(1). Moreover, since get of <sup>P</sup><sup>1</sup> is not admissible in 0, there are no silent transitions out of [0, *can*zs(P1)] according to Definition 3(2). Thus, [0, *can*zs(P1)] zs <sup>↓</sup> put, as claimed.

When the zs.put is executed by <sup>P</sup><sup>2</sup> it turns into <sup>P</sup> <sup>2</sup> <sup>=</sup> <sup>u</sup> = ys.get() and the policy state for zs advances to <sup>μ</sup> zs = 1, while ys is still at <sup>μ</sup>ys = 0. Now ys.get of <sup>P</sup> <sup>2</sup> blocks for the same reason as zs was blocked in <sup>P</sup><sup>1</sup> before. But since <sup>P</sup><sup>2</sup> has advanced, its prediction on zs reduces to *can*zs(P <sup>2</sup>) = ∅. Therefore, the transition 1 <sup>−</sup>get:∅→ 1 implies [1, *can*zs(P <sup>2</sup>)] <sup>−</sup>get:∅→ [1, *can*zs(P 2)] by Definition 3(1). Also, there are no silent transitions out of [1, *can*zs(P <sup>2</sup>)] by Definition 3(2) and so [μ zs, *can*zs(P <sup>2</sup>)] zs <sup>↓</sup> get by Definition 2. This permits <sup>P</sup><sup>1</sup> to execute zs.get() and proceed to <sup>P</sup> <sup>1</sup> <sup>=</sup> ys.put(5 + 1). The policy state of zs is not changed by this, neither is the state of ys, whence <sup>P</sup> <sup>2</sup> is still blocked. Yet, we have [μys, *can*zs(P <sup>2</sup>)] ys <sup>↓</sup> put which lets <sup>P</sup> <sup>1</sup> complete ys.put. It reaches P <sup>1</sup> with *can*ys(P <sup>1</sup> ) = <sup>∅</sup> and changes the policy state of ys to <sup>μ</sup> ys = 1. At this point, [μ ys, *can*zs(P <sup>1</sup> )] ys <sup>↓</sup> get which means <sup>P</sup> <sup>2</sup> unblocks to execute ys.get. 

**Definition 4.** *Let <sup>c</sup> be a policy for c. A method sequence m*<sup>1</sup> blocks *another m*<sup>2</sup> *in state* μ*, written* μ *<sup>c</sup> <sup>m</sup>*<sup>1</sup> <sup>→</sup> *<sup>m</sup>*2*, if* <sup>μ</sup> *<sup>c</sup>* <sup>↓</sup> *<sup>m</sup>*<sup>2</sup> *but* [μ, {*m*1}] *<sup>c</sup>* ↓ *m*2*. Two method sequences m*<sup>1</sup> *and m*<sup>2</sup> *are* concurrently enabled*, denoted* μ *<sup>c</sup> m*<sup>1</sup> *m*<sup>2</sup> *if* μ *<sup>c</sup>* <sup>↓</sup> *<sup>m</sup>*1*,* <sup>μ</sup> *<sup>c</sup>* <sup>↓</sup> *<sup>m</sup>*<sup>2</sup> *and both* <sup>μ</sup> *<sup>c</sup> <sup>m</sup>*<sup>1</sup> <sup>→</sup> *<sup>m</sup>*<sup>2</sup> *and* <sup>μ</sup> *<sup>c</sup> m*<sup>2</sup> → *m*1*.* 

Our operational semantics will only let a tuc execute a sequence *m* provided [μ, γ] c <sup>↓</sup> *<sup>m</sup>*, where <sup>μ</sup> is the current policy state of <sup>c</sup> and <sup>γ</sup> the predicted activity in the tuc's concurrent environment. Symmetrically, the environment will execute any *<sup>n</sup>* <sup>∈</sup> <sup>γ</sup> only if it is enabled with respect to *<sup>m</sup>*, i.e., if [μ, {*m*}] - <sup>↓</sup> *<sup>n</sup>*. This means <sup>μ</sup> c *<sup>m</sup> <sup>n</sup>*. Policy coherence (Definition <sup>5</sup> below) then implies that every interleaving of the sequences *<sup>m</sup>* and any *<sup>n</sup>* <sup>∈</sup> <sup>γ</sup> leads to the same return values and final variable state (Proposition 1).

#### **2.5 Coherence and Determinacy**

<sup>A</sup> *method call* <sup>m</sup>(v) combines a method <sup>m</sup> <sup>∈</sup> <sup>M</sup>c with a method parameter<sup>7</sup> <sup>v</sup> <sup>∈</sup> <sup>D</sup>, where <sup>D</sup> is a universal domain for method arguments and return values,

<sup>7</sup> This is without loss of generality since D may contain value tuples.

including the special don't care value <sup>∈</sup> <sup>D</sup>. We denote by <sup>A</sup>c <sup>=</sup> {m(v) <sup>|</sup> <sup>m</sup> <sup>∈</sup> <sup>M</sup>c, v <sup>∈</sup> <sup>D</sup>} the set of all method calls on object <sup>c</sup>. Sequences of method calls <sup>α</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> c can be abstracted back into sequences of methods <sup>α</sup># <sup>∈</sup> <sup>M</sup><sup>∗</sup> c by dropping the method parameters: ε# = ε and (m(v) α)# = m α#.

Coherence concerns the semantics of method calls as state transformations. Let <sup>S</sup>c be the domain of memory states of the object <sup>c</sup> with initial state *init* c <sup>∈</sup> Sc. Each method call <sup>m</sup>(v) <sup>∈</sup> <sup>A</sup>c corresponds to a semantical action [[m(v)]]c <sup>∈</sup> Sc <sup>→</sup> (<sup>D</sup> <sup>×</sup> <sup>S</sup>c). If <sup>s</sup> <sup>∈</sup> <sup>S</sup>c is the current state of the object then executing a call <sup>m</sup>(v) on c returns a pair (u, s ) = [[m(v)]]c(s) where the first projection <sup>u</sup> <sup>∈</sup> <sup>D</sup> is the return value from the call and the second projection <sup>s</sup> <sup>∈</sup> <sup>S</sup>c is the new updated state of the variable. For convenience, we will denote <sup>u</sup> <sup>=</sup> <sup>π</sup>1[[m(v)]]c(s) by <sup>u</sup> <sup>=</sup> s.m(v) and <sup>s</sup> <sup>=</sup> <sup>π</sup>2[[m(v)]]c(s) by <sup>s</sup> <sup>=</sup> <sup>s</sup> <sup>m</sup>(v). The action notation is extended to sequences of calls <sup>α</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> c in the natural way: <sup>s</sup> <sup>ε</sup> <sup>=</sup> <sup>s</sup> and <sup>s</sup> (m(v) <sup>α</sup>)=(<sup>s</sup> <sup>m</sup>(v)) <sup>α</sup>.

For policy-based scheduling we assume an abstraction function mapping a memory state <sup>s</sup> <sup>∈</sup> <sup>S</sup>c into a policy state <sup>s</sup># <sup>∈</sup> <sup>P</sup>c. Specifically, *init*# c <sup>=</sup> <sup>ε</sup>. Further, we assume the abstraction commutes with method execution in the sense that if we execute a sequence of calls and then abstract the final state, we get the same as if we executed the policy automaton on the abstracted state in the first place. Formally, (<sup>s</sup> <sup>α</sup>)# <sup>=</sup> <sup>s</sup># <sup>α</sup># for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>c and <sup>α</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> c.

**Definition 5 (Coherence).** *A* csm *variable c is* policy-coherent *if for all method calls* a, b <sup>∈</sup> *<sup>A</sup><sup>c</sup> whenever* <sup>s</sup># *<sup>c</sup>* <sup>a</sup># <sup>b</sup># *for a state* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*c, then* <sup>a</sup> *and* <sup>b</sup> *are* confluent *in the sense that* s.a = (<sup>s</sup> <sup>b</sup>).a*,* s.b = (<sup>s</sup> <sup>a</sup>).b *and* <sup>s</sup> <sup>a</sup> <sup>b</sup> <sup>=</sup> <sup>s</sup> <sup>b</sup> <sup>a</sup>*.* 

*Example 3.* Esterel pure signals do not carry any data value, so their memory state coincides with the policy state, <sup>S</sup>s <sup>=</sup> <sup>P</sup>s <sup>=</sup> {0, <sup>1</sup>} and <sup>s</sup># <sup>=</sup> <sup>s</sup>. An emission emit does not return any value but sets the state of s to 1, i.e., s. emit( ) = <sup>∈</sup> <sup>D</sup> and <sup>s</sup>emit( )=1 <sup>∈</sup> <sup>S</sup>s. A present test returns the state, s. pres( ) = <sup>s</sup>, but does not modify it, <sup>s</sup>pres( ) = <sup>s</sup>. From the policy Fig. <sup>1</sup> we find that the concurrent enablings s# s <sup>a</sup># <sup>b</sup># according to Definition <sup>4</sup> are (i) <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>∈</sup> {pres( ), emit( )} for arbitrary <sup>s</sup>, or (ii) <sup>s</sup> = 1, <sup>a</sup> <sup>=</sup> emit( ) and <sup>b</sup> <sup>=</sup> pres( ). In each of these cases we verify s.a = (<sup>s</sup> <sup>b</sup>).a, s.b = (<sup>s</sup> <sup>a</sup>).b and <sup>s</sup> <sup>a</sup> <sup>b</sup> <sup>=</sup> <sup>s</sup> <sup>b</sup> <sup>a</sup> without difficulty. Note that 1 s emit pres since the order of execution is irrelevant if s = 1. On the other hand, 0 s emit pres because in state 0 both methods are not confluent. Specifically, 0. pres( )=0 =1= (0 emit( )). pres( ). 

A special case are *linear precedence policies* where μ c <sup>↓</sup> <sup>m</sup> for all <sup>m</sup> <sup>∈</sup> <sup>M</sup>c and μ c <sup>m</sup> <sup>→</sup> <sup>n</sup> is a linear ordering on <sup>M</sup>c, for all policy states <sup>μ</sup>. Then, for no state we have μ c <sup>m</sup><sup>1</sup> <sup>m</sup>2, so there is no concurrency and thus no confluence requirement to satisfy at all. Coherence of c is trivially satisfied whatever the semantics of method calls. For any two admissible methods one takes precedence over the other and thus the enabling relation becomes deterministic. There is however a risk of deadlock which can be excluded if we assume that threads always call methods in order of decreasing precedence.

The other extreme case is where the policy makes all methods concurrently enabled, i.e., μ c <sup>m</sup><sup>1</sup> <sup>m</sup><sup>2</sup> for all policy states <sup>μ</sup> and methods <sup>m</sup>1, <sup>m</sup>2. This avoids deadlock completely and gives maximal concurrency but imposes the strongest confluence condition, viz. independently of the scheduling order of any two methods, the resulting variable state must be the same. This requires complete isolation of the effects of any two methods. Such an extreme is used, e. g., in the CR library [19]. The typical csm variable, however, will strike a tradeoff between these two extremes. It will impose a sensible set of precedences that are strong enough to ensure coherent implementations and thus determinacy for policy-conformant scheduling, while at the same time being sufficiently relaxed to permit concurrent implementations and avoiding unnecessary deadlocks risking that programs are rejected by the compiler as un-scheduleable.

Whatever the policies, if the variables are coherent, then all policyconformant interleavings are indistinguishable for each csm variable. To state schedule invariance in its general form we lift method actions and independence to multi-variable sequences of methods calls <sup>A</sup> <sup>=</sup> {c.m(v) <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>O</sup>, m(v) <sup>∈</sup> <sup>A</sup>c}. For a given sequence <sup>α</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> let <sup>π</sup>c(α) <sup>∈</sup> <sup>A</sup><sup>∗</sup> c be the projection of <sup>α</sup> on <sup>c</sup>, formally πc(ε) = <sup>ε</sup>, <sup>π</sup>c(c.m(v) <sup>α</sup>) = <sup>m</sup>(v) <sup>π</sup>c(α) and <sup>π</sup>c(c .m(v) <sup>α</sup>) = <sup>π</sup>c(α) for <sup>c</sup> <sup>=</sup> <sup>c</sup>. A global memory <sup>Σ</sup> <sup>∈</sup> <sup>S</sup> <sup>=</sup> - c∈<sup>O</sup> <sup>S</sup><sup>c</sup> assigns a local memory Σ.<sup>c</sup> <sup>∈</sup> <sup>S</sup><sup>c</sup> to each variable <sup>c</sup>. We write *init* for the initial memory that has *init*.<sup>c</sup> <sup>=</sup> *init* c and (*init*.c)# <sup>=</sup> <sup>ε</sup> <sup>∈</sup> <sup>P</sup>c.

Given a global memory <sup>Σ</sup> <sup>∈</sup> <sup>S</sup> and sequences α, β <sup>∈</sup> <sup>A</sup><sup>∗</sup> of method calls, we extend the independence relation of Definition 4 variable-wise, defining Σ <sup>α</sup> <sup>β</sup> iff (Σ.c)# c (πc(α))# (πc(β))#. The application of a method call <sup>a</sup> <sup>∈</sup> <sup>A</sup> to a memory <sup>Σ</sup> <sup>∈</sup> <sup>S</sup> is written Σ.a <sup>∈</sup> <sup>S</sup> and defined (Σ.(c.m(v))).c = (Σ.c).m(v) and (Σ.(c.m(v))).c <sup>=</sup> Σ.c for c <sup>=</sup> c. Analogously, method actions are lifted to global memories, i.e., (<sup>Σ</sup> c.m(v)).c <sup>=</sup> Σ.c if <sup>c</sup> <sup>=</sup> <sup>c</sup> and (<sup>Σ</sup> <sup>c</sup>.m(v)).<sup>c</sup> <sup>=</sup> Σ.c <sup>m</sup>(v).

**Proposition 1 (Commutation).** *Let all* csm *variables be policy-coherent and* Σ <sup>a</sup> <sup>α</sup> *for a memory* <sup>Σ</sup> <sup>∈</sup> <sup>S</sup>*, method call* <sup>a</sup> <sup>∈</sup> *<sup>V</sup> and sequences of method calls* <sup>α</sup> <sup>∈</sup> *<sup>V</sup>*<sup>∗</sup>*. Then,* <sup>Σ</sup> <sup>a</sup> <sup>α</sup> <sup>=</sup> <sup>Σ</sup> <sup>α</sup> <sup>a</sup> *and* Σ.a = (<sup>Σ</sup> <sup>α</sup>).a*.*

### **2.6 Policies and Modularity**

Consider the synchronous data-flow network cnt in Fig. 3b with three process nodes, a multiplexer mux, a register reg and an incrementor inc. Their DCoL code is given in Fig. 3a. The network implements a settable counter, which produces at its output ys a stream of consecutive integers, incremented with each clock tick. The wires ys, zs and ws are IVars (see Example 2) carrying a single integer value per tick. The input xs is a pure Esterel signal (see Example 1). The counter state is stored by reg in a local variable xv with read and write methods that can be called by a single thread only. The register is initialised to value 0 and in each subsequent tick the value at ys is stored. The inc takes the value at zs and increments it. When the signal xs is absent, mux passes the incremented value on ws to ys for the next tick. Otherwise, if xs is present then mux resets ys.

The evaluation order is implemented by the policies of the IVars ys, zs and ws. In each case the put method takes precedence over get which makes sure that the latter is blocked until the former has been executed. The causality cycle of the feedback loop is broken by the fact that the reg node first sends the current counter value to zs before it waits for the new value at ys. The other nodes mux and inc, in contrast, first read their inputs and then send to their output.

**Fig. 3.** Synchronous data-flow network cnt built from control-flow processes.

Now suppose, for modularity, the reg node is pre-compiled into a synchronous IO automaton to be used by mux and inc as a black box component. Then, reg must be split into three *modes* [20] reg.init, reg.get and reg.set that can be called independently in each instant. The init mode initialises the register memory with 0. The get mode extracts the buffered value and set stores a new value into the register. Since there may be data races if get and set are called concurrently on reg, a policy must be imposed. In the scheduling of Fig. 3b, first reg.get is executed to place the output on zs. Then, reg waits for mux to produce the next value of ys from xs or ws. Finally, reg.set is executed to store the current value of ys for the next tick. Thus, the natural policy for the register is to require that in each tick set is called by at most one thread and if so no concurrent call to get by another thread happens afterwards. In addition, the policy requires init to take place at least once before any set or get. Hence, the policy has two states <sup>P</sup>reg <sup>=</sup> {0, <sup>1</sup>} with initial <sup>ε</sup> = 0 and admissibility such that 0 reg <sup>↓</sup> <sup>m</sup> iff <sup>m</sup> <sup>=</sup> init and 1 reg <sup>↓</sup> <sup>m</sup> for all <sup>m</sup>. The transitions are <sup>0</sup> init = 1 and 1 <sup>m</sup> = 1 for all <sup>m</sup> <sup>∈</sup> <sup>M</sup>reg. Further, for coherence, in state 1 no set may be concurrent and every get must take place before any concurrent set. This means, we have 1 reg <sup>m</sup> <sup>→</sup> set for all <sup>m</sup> ∈ {get, set}. Figure 3c shows the partially compiled code in which reg is treated as a compiled object. The policy on reg makes sure the accesses by mux and inc are scheduled in the right way (see Example 4). Note that reg is not an IVar because it has memory.

The cnt example exhibits a general pattern found in the modular compilation of SP: Modules (here reg) may be exercised *several times* in a synchronous tick through *modes* which are executed in a specific *prescribed order*. Mode calls (here reg.set, reg.get) in the same module are coupled via common *shared memory* (here the local variable xs) while mode calls in distinct modules are isolated from each other [15,20].

### **3 Constructive Semantics of DCoL**

To formalise our semantics it is technically expedient to keep track of the *completion status* of each active thread inside the program expression. This results in a syntax for *processes* distinguished from programs in that each parallel composition <sup>P</sup><sup>1</sup> *<sup>k</sup>*<sup>1</sup>||*<sup>k</sup>*<sup>2</sup> <sup>P</sup><sup>2</sup> is labelled by *completion codes* <sup>k</sup>*<sup>i</sup>* ∈ {⊥, <sup>0</sup>, <sup>1</sup>} which indicate whether each thread is *waiting* <sup>k</sup>*<sup>i</sup>* <sup>=</sup> <sup>⊥</sup>, *terminated* 0 or *pausing* <sup>k</sup>*<sup>i</sup>* = 1. Since we remove a process from the parallel as soon as it terminates then the code k*<sup>i</sup>* = 0 cannot occur. An expression <sup>P</sup><sup>1</sup> || <sup>P</sup><sup>2</sup> is considered a special case of a process with <sup>k</sup>*<sup>i</sup>* <sup>=</sup> <sup>⊥</sup>. The formal semantics is given by a reduction relation on processes

$$
\Sigma; \Pi \vdash P \stackrel{\text{m}}{\rightleftharpoons} \Sigma' \vdash\_{k'} P' \tag{2}
$$

specified by the inductive rules in Figs. 4 and 5. The relation (2) determines an instantaneous *sequential reduction step* of process P, called an *sstep*, that follows a sequence of enabled method calls *<sup>m</sup>* <sup>∈</sup> <sup>M</sup><sup>∗</sup> in sequential program order in <sup>P</sup>. This does not include any context switches between concurrent threads inside P. For thread communication, several ssteps must be chained up, as described later. The sstep (2) results in an updated memory Σ and residual process P . The subscript k is a completion code, described below. The reduction (2) is performed in a context consisting of a global memory <sup>Σ</sup> <sup>∈</sup> <sup>S</sup> (*must* context) containing the current state of all csm variables and an environment prediction <sup>Π</sup> <sup>⊆</sup> <sup>M</sup><sup>∗</sup> (*can* context). The prediction records all potentially outstanding methods sequences from threads running *concurrently* with P.

We write <sup>π</sup>c(*m*) <sup>∈</sup> <sup>M</sup><sup>∗</sup> c for the projection of a method sequence *<sup>m</sup>* <sup>∈</sup> <sup>M</sup><sup>∗</sup> to variable <sup>c</sup> and write <sup>π</sup>c(Π) for its lifting to sets of sequences. Prefixing is lifted, too, i.e., c.m <sup>Π</sup> <sup>=</sup> {c.m *<sup>m</sup>* <sup>|</sup> *<sup>m</sup>* <sup>∈</sup> <sup>Π</sup>} for any c.m <sup>∈</sup> <sup>M</sup>.

Performing a method call c.m(v) in <sup>Σ</sup>; <sup>Π</sup> advances the *must* context to <sup>Σ</sup> c.m(v) but leaves <sup>Π</sup> unchanged. The sequence of methods *<sup>m</sup>* <sup>∈</sup> <sup>M</sup><sup>∗</sup> in (2) is *enabled* in Σ; Π, written [Σ,Π] - <sup>↓</sup> *<sup>m</sup>* meaning that [(Σ.c)#, πc(Π)] c <sup>↓</sup> <sup>π</sup>c(*m*) for all c <sup>∈</sup> <sup>O</sup>. In this way, the context [Σ,Π] forms a joint policy state for all variables for the tuc P, in the sense of Sect. 2 (Definition 3).

$$\begin{array}{c} \Sigma; \varPi \vdash P \stackrel{\mathsf{m}\_{1}}{\rightleftarrows} \Sigma' \vdash\_{k'} P' \quad \quad k' \neq 0 \quad \mathsf{Seq}\_{1} \\\hline \Sigma; \varPi \vdash P \;; \, Q \stackrel{\mathsf{m}\_{1}}{\rightleftarrows} \Sigma' \vdash\_{k'} P' \;; \, Q \end{array} \mathsf{Seq}\_{1}$$

$$\begin{array}{c} \Sigma; \varPi \vdash P \stackrel{\mathsf{m}\_{1}}{\rightleftarrows} \Sigma' \vdash\_{0} P' \qquad \quad \quad \Sigma'; \varPi \vdash Q \stackrel{\mathsf{m}\_{2}}{\rightleftarrows} \Sigma'' \vdash\_{k'} Q' \qquad \mathsf{Seq}\_{2} \\\hline \Sigma; \varPi \vdash P \;; \, Q \stackrel{\mathsf{m}\_{1}\mathsf{m}\_{2}}{\rightleftarrows} \Sigma'' \vdash\_{k'} Q' \qquad \end{array}$$

$$\begin{array}{ccc}\hline\hline\Sigma;H\vdash\mathsf{skip}\triangleq\Sigma\vdash\_{0}\mathsf{skip} & \begin{array}{c}\mathsf{Cmp}\_{1}\\\Sigma;H\vdash\mathsf{pause}\triangleq\Sigma\vdash\_{1}\mathsf{pause}\end{array}\end{array} & \begin{array}{c}\begin{array}{c}\Sigma;\mathsf{pups}\triangleq\end{array}\mathsf{Cmp}\_{2}\\\hline\hline\end{array}\end{array}$$

$$\frac{\Sigma; H \vdash P\{\mathsf{rec}\,p.\,P/p\} \stackrel{\overline{m}}{\Longrightarrow} \Sigma' \vdash\_{k'} P'}{\Sigma; H \vdash \mathsf{rec}\,p.\,P \stackrel{\overline{m}}{\Longrightarrow} \Sigma' \vdash\_{k'} P'} \mathsf{Rec}$$

**Fig. 4.** SStep reductions for sequence, completion and recursion.

Most of the rules in Figs. 4 and 5 should be straightforward for the reader familiar with structural operational semantics. Seq<sup>1</sup> is the case of a sequential <sup>P</sup>; <sup>Q</sup> where <sup>P</sup> pauses or waits (k = 0) and Seq<sup>2</sup> is where <sup>P</sup> terminates and control passes into <sup>Q</sup>. The statements skip and pause are handled by rules Cmp<sup>1</sup> and Cmp2. The rule Rec explains recursion rec p.P by syntactic unfolding of the recursion body P. All interaction with the memory takes place in the method calls let <sup>x</sup> <sup>=</sup> <sup>c</sup>.m(e) in <sup>P</sup>. Rule Let<sup>1</sup> is applicable when the method call is enabled, i.e., [Σ,Π] - <sup>↓</sup> c.m. Since processes are closed, the argument expression <sup>e</sup> must evaluate, *eval*(e) = <sup>v</sup>, and we obtain the new memory <sup>Σ</sup> c.m(v) and return value Σ.c.m(v). The return value is substituted for the local (stack allocated) identifier <sup>x</sup>, giving the continuation process <sup>P</sup>{Σ.c.m(v)/x} which is run in the updated context <sup>Σ</sup> c.m(v); <sup>Π</sup>. The prediction <sup>Π</sup> remains the same. The second rule Let<sup>2</sup> is used when the method call is blocked or the thread wants to wait and yield to the scheduler. The rules for conditionals Cnd1, Cnd<sup>2</sup> are canonical. More interesting are the rules Par1–Par<sup>4</sup> for parallel composition, which implement non-deterministic thread switching. It is here where we need to generate predictions and pass them between the threads to exercise the policy control.

The key operation is the computation of the *can*-prediction of a process P to obtain an over-approximation of the set of possible method sequences potentially executed by <sup>P</sup>. For compositionality we work with sequences *can<sup>s</sup>*(P) <sup>⊆</sup> <sup>M</sup><sup>∗</sup> <sup>×</sup> {0, <sup>1</sup>} *stoppered* with a completion code 0 if the sequence ends in termination or

$$\frac{\mathbb{E}\left[\Sigma,\varPi\right]\Vdash\mathfrak{c}.m\quad\text{eval}(e)=v}{\mathbb{E}\left[\varSigma\Vdash\mathfrak{c}.m(v);\varPi\vdash P\left\{\Sigma.\mathfrak{c}.m(v)/x\right\}\right]\varstackrel{\mathfrak{m}}{\implies}\Sigma'\vdash\_{k'}P'}\mathsf{Let\_1}{\begin{array}{c}\varSigma\Vdash\mathfrak{c}.m\quad\text{ind}\ \varPi\Vdash\mathfrak{c}.m\quad\text{identity}\end{array}\left(\varSigma\Vdash\mathfrak{c}.\begin{array}{c}\varPi\\\upPi\end{array}\right)}\mathbf{\upleft}$$

$$\boxed{\Sigma; H \vdash \mathsf{1ot}\ x = \mathsf{c.r.}m(e) \land n\, P \not\Rightarrow \Sigma \vdash\_{\perp} \mathsf{1ot}\ x = \mathsf{c.r.}m(e) \mathsf{1n}\, P}$$

$$\begin{array}{ll}\text{eval}(e) = \text{true} & \Sigma; \varPi \vdash P \stackrel{\text{m}}{\Longrightarrow} \Sigma' \vdash\_{k'} P'\\\hline \Sigma; \varPi \vdash \text{if } e \text{ then } P \text{ \texttt{else} } Q \stackrel{\text{m}}{\Longrightarrow} \Sigma' \vdash\_{k'} P'\\\hline \end{array}\begin{array}{ll}\text{Cnd}\_1\\\text{eval}(e) = \text{false} & \Sigma; \varPi \vdash Q \stackrel{\text{m}}{\Longrightarrow} \Sigma' \vdash\_{k'} Q'\\\hline \Sigma; \varPi \vdash \text{if } e \text{ then } P \text{ \texttt{else} } Q \stackrel{\text{m}}{\Longrightarrow} \Sigma' \vdash\_{k'} Q' \end{array}$$

$$\frac{\Sigma; H \otimes \operatorname{can}(Q) \vdash P \xrightarrow{\operatorname{m}} \Sigma' \vdash\_{k'} P' \quad k' \neq 0 \ \operatorname{ $\operatorname{Par}\_1$ } \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_1} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname{\mathsf{Par}\_2} \ \operatorname$$

**Fig. 5.** SStep reductions for method calls, conditional and parallel.

1 if it ends in pausing. The symbols ⊥0, ⊥<sup>1</sup> and are the *terminated*, *paused* and *fully unconstrained can* contexts, respectively, with <sup>⊥</sup><sup>0</sup> <sup>=</sup> {(ε, 0)}, <sup>⊥</sup><sup>1</sup> <sup>=</sup> {(ε, 1)} and <sup>=</sup> <sup>M</sup><sup>∗</sup> × {0, <sup>1</sup>}. The set *can<sup>s</sup>*(P), defined in Fig. 6, is extracted from the structure of <sup>P</sup> using prefixing c.m <sup>Π</sup> , choice Π <sup>1</sup> <sup>⊕</sup> <sup>Π</sup> <sup>2</sup> = Π <sup>1</sup> <sup>∪</sup> <sup>Π</sup> 2, parallel Π <sup>1</sup> <sup>⊗</sup> <sup>Π</sup> <sup>2</sup> and sequential composition Π <sup>1</sup> · <sup>Π</sup> <sup>2</sup>. Sequential composition is obtained pairwise on stoppered sequences such that (*m*, 0)·(*n*, c)=(*m n*, c) and (*m*, 1)·(*n*, c)=(*m*, 1). As a consequence, <sup>⊥</sup>0·Π <sup>=</sup> <sup>Π</sup> and <sup>⊥</sup>1·Π <sup>=</sup> <sup>⊥</sup>1. Parallel composition is pairwise free interleaving with synchronisation on completion codes. Specifically, a product (*m*, c) <sup>⊗</sup> (*n*, d) generates all interleavings of *<sup>m</sup>* and *n* with a completion that models a parallel composition that terminates iff both threads terminate and pauses if one pauses. Formally, (*m*, c) <sup>⊗</sup> (*n*, d) = {(*c*, *max*(c, d)) <sup>|</sup> *<sup>c</sup>* <sup>∈</sup> *<sup>m</sup>* <sup>⊗</sup> *<sup>n</sup>*}. Thus, <sup>Π</sup> *<sup>P</sup>* <sup>⊗</sup> <sup>Π</sup> *<sup>Q</sup>* <sup>=</sup> <sup>⊥</sup><sup>0</sup> iff <sup>Π</sup> *<sup>P</sup>* <sup>=</sup> <sup>⊥</sup><sup>0</sup> <sup>=</sup> <sup>Π</sup> *<sup>Q</sup>* and Π *<sup>P</sup>* <sup>⊗</sup> <sup>Π</sup> *<sup>Q</sup>* <sup>=</sup> <sup>⊥</sup><sup>1</sup> if <sup>Π</sup> *<sup>P</sup>* <sup>=</sup> <sup>⊥</sup><sup>1</sup> <sup>=</sup> <sup>Π</sup> *<sup>Q</sup>*, or <sup>Π</sup> *<sup>P</sup>* <sup>=</sup> <sup>⊥</sup><sup>0</sup> and <sup>Π</sup> *<sup>Q</sup>* <sup>=</sup> <sup>⊥</sup>1, or <sup>Π</sup> *<sup>P</sup>* = ⊥<sup>1</sup> and Π *<sup>Q</sup>* <sup>=</sup> <sup>⊥</sup>0. From *can<sup>s</sup>*(P) we obtain *can*(P) <sup>⊆</sup> <sup>M</sup><sup>∗</sup> by dropping all stopper codes, i.e., *can*(P) = {*<sup>m</sup>* | ∃d.(*m*, d) <sup>∈</sup> *can<sup>s</sup>*(P)}.

The rule Par<sup>1</sup> exercises a parallel <sup>P</sup> *<sup>k</sup>*||*<sup>k</sup><sup>Q</sup>* <sup>Q</sup> by performing an sstep in <sup>P</sup>. This sstep is taken in the extended context <sup>Σ</sup>; <sup>Π</sup> <sup>⊗</sup> *can*(Q) in which the prediction of the sibling Q is added to the method prediction Π for the outer environment

$$\begin{aligned} can''(\mathsf{skip}) &= \operatorname{can}^s(p) = \bot\_0 \qquad \operatorname{can}^s(\mathsf{pause}) = \bot\_1\\ \operatorname{can}^s(\mathsf{tree}.P.) &= \operatorname{can}^s(P) \qquad \operatorname{can}^s(P \parallel Q) = \operatorname{can}^s(P) \otimes \operatorname{can}^s(Q) \\ \operatorname{can}^s(P:Q) &= \begin{cases} \operatorname{can}^s(P) & \text{if } \operatorname{can}^s(P) \subseteq \mathsf{M}^s \times \{1\} \\ \operatorname{can}^s(P) \cdot \operatorname{can}^s(Q) & \text{otherwise} \end{cases} \\ \operatorname{can}^s(\mathsf{1ot}.x = \mathsf{c}.m(e) \,\mathsf{in} P) &= \mathsf{c}.m \odot \operatorname{can}^s(P) \\ \operatorname{can}^s(\mathsf{if}.e \text{ then } P \text{ \textbf{else} } Q) &= \begin{cases} \operatorname{can}^s(P) & \text{if } \operatorname{eval}(e) = \mathsf{true} \\ \operatorname{can}^s(Q) & \text{if } \operatorname{eval}(e) = \mathsf{false} \\ \operatorname{can}^s(P) \oplus \operatorname{can}^s(Q) & \text{otherwise}. \end{cases} \end{aligned}$$

**Fig. 6.** Computing the *can* prediction.

in which the parent <sup>P</sup> || <sup>Q</sup> is running. In this way, <sup>Q</sup> can block method calls of P. When P finally yields as P with a non-terminating completion code, <sup>0</sup> <sup>=</sup> <sup>k</sup> ∈ {⊥, <sup>1</sup>}, the parallel completes as <sup>P</sup> *<sup>k</sup>*-||*<sup>k</sup><sup>Q</sup>* <sup>Q</sup> with code <sup>k</sup> <sup>k</sup>*Q*. This operation is defined <sup>k</sup><sup>1</sup> k<sup>2</sup> = 1 if <sup>k</sup><sup>1</sup> =1= <sup>k</sup><sup>2</sup> and <sup>k</sup><sup>1</sup> k<sup>2</sup> <sup>=</sup> <sup>⊥</sup>, otherwise. When P terminates its sstep with code k = 0 then we need rule Par<sup>2</sup> which removes child P from the parallel composition. The rules Par3, Par<sup>4</sup> are symmetrical to Par1, Par2. They run the right child <sup>Q</sup> of a parallel <sup>P</sup> *<sup>k</sup><sup>P</sup>* ||*<sup>k</sup>* <sup>Q</sup>.

*Completion and Stability.* A process <sup>P</sup> is 0*-stable* if <sup>P</sup> <sup>=</sup> skip and 1*-stable* if <sup>P</sup> <sup>=</sup> pause or <sup>P</sup> <sup>=</sup> <sup>P</sup> <sup>1</sup> ; <sup>P</sup> <sup>2</sup> and P <sup>1</sup> is 1*-stable*, or P = P <sup>1</sup> <sup>1</sup>||<sup>1</sup> <sup>P</sup> <sup>2</sup>, and P *<sup>i</sup>* are 1 stable. A process is *stable* if it is 0-stable or 1-stable. A process expression is *wellformed* if in each sub-expression <sup>P</sup><sup>1</sup> *<sup>k</sup>*<sup>1</sup>||*<sup>k</sup>*<sup>2</sup> <sup>P</sup><sup>2</sup> of <sup>P</sup> the completion annotations are matching with the processes, i.e., if <sup>k</sup>*<sup>i</sup>* <sup>=</sup> <sup>⊥</sup> then <sup>P</sup>*<sup>i</sup>* is <sup>k</sup>*i*-stable. Stable processes are well-formed by definition. For stable processes we define a *(syntactic) tick function* which steps a stable process to the next tick. It is defined such that <sup>σ</sup>(skip) = skip, <sup>σ</sup>(pause) = skip, <sup>σ</sup>(P <sup>1</sup> ; <sup>P</sup> <sup>2</sup>) = <sup>σ</sup>(P <sup>1</sup>) ; <sup>P</sup> <sup>2</sup> and <sup>σ</sup>(P <sup>1</sup> *<sup>k</sup>*<sup>1</sup>||*<sup>k</sup>*<sup>2</sup> P <sup>2</sup>) = <sup>σ</sup>(P <sup>1</sup>)|| <sup>σ</sup>(P 2).

*Example 4.* The data-flow cnt-cmp from Fig. 3c can be represented as a DCoL process in the form <sup>C</sup> <sup>=</sup> reg.init(0); (<sup>M</sup> <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup>) with

$$\begin{aligned} M &=\_{df} \mathsf{rec}\, p.\, \mathbf{v} = \mathsf{xs}.\mathsf{pres}(); P(v); \mathsf{passe}; p \\ P(v) &=\_{df} \text{ if } v \text{ then } \mathsf{reg}.\mathsf{set}(0); \mathsf{else}.Q \\ Q &=\_{df} u = \mathsf{ws}.\mathsf{get}(); \mathsf{reg}.\mathsf{set}(u); \\ I &=\_{df} \mathsf{rec}\, q.\, \mathbf{v} = \mathsf{reg}.\mathsf{get}(); \mathsf{ws}.\mathsf{put}(v+1); \mathsf{passe}; q.\end{aligned}$$

Let us evaluate process <sup>C</sup> from an initialised memory <sup>Σ</sup><sup>0</sup> such that <sup>Σ</sup>0.xs =0= <sup>Σ</sup>0.ws, and empty environment prediction {}.

The first sstep is executed from the context <sup>Σ</sup>0; {} with empty *can* prediction. Note that reg.init(0); (<sup>M</sup> <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup>) abbreviates let <sup>=</sup> reg.init(0) in (<sup>M</sup> <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup>). In context <sup>Σ</sup>0; {} the method call reg.init(0) is enabled, i.e., [Σ0, {}] - <sup>↓</sup> reg.init. Since *eval*(0) = 0, we can execute the first method call of <sup>C</sup> using rule Let1. This advances the memory to <sup>Σ</sup><sup>1</sup> <sup>=</sup> <sup>Σ</sup><sup>0</sup> reg.init(0). The continuation process <sup>M</sup> <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup> is now executed in context <sup>Σ</sup>1; <sup>⊥</sup>0. The left child <sup>M</sup> starts with method call xs. pres() and the right child <sup>I</sup> with reg.get(). The latter is admissible, since (Σ1.reg)# = 1. Moreover, get does not need to honour any precedences, whence it is enabled, [Σ1, Π] - <sup>↓</sup> reg.get for any <sup>Π</sup>. On the other hand, xs. pres in <sup>M</sup> is enabled only if (Σ1.xs)# = 1 or if there is no concurrent emit predicted for xs. Indeed, this is the case: The concurrent context of <sup>M</sup> is <sup>Π</sup>*<sup>I</sup>* <sup>=</sup> {} ⊗ *can*(I) = *can*(I) = {reg.get · ws.put}. We project <sup>π</sup>xs(Π*<sup>I</sup>* ) = {} and find [Σ1, Π*<sup>I</sup>* ] - <sup>↓</sup> xs. pres. Hence, we have a nondeterministic choice to take an sstep in M or in I. Let us use rule Par1/Par<sup>2</sup> to run M in context Σ; Π*<sup>I</sup>* . By loop unfolding Rec and rule Let<sup>1</sup> we execute the present test of <sup>M</sup> which returns the value <sup>Σ</sup>1.xs. pres() = false. This leads to an updated memory <sup>Σ</sup><sup>2</sup> <sup>=</sup> <sup>Σ</sup><sup>1</sup> xs. pres() = <sup>Σ</sup><sup>1</sup> and continuation process <sup>P</sup>(false); pause; <sup>M</sup>. In this (right associated) sequential composition we first execute <sup>P</sup>(false) where the conditional rule Cnd<sup>2</sup> switches to the else branch <sup>Q</sup> which is <sup>u</sup> <sup>=</sup> ws.get(); reg.set(u);, still in the context <sup>Σ</sup>2, Π*<sup>I</sup>* . The reading of the data-flow variable ws, however, is not enabled, [Σ2, Π*<sup>I</sup>* ] - <sup>↓</sup> ws.get, because (Σ2.ws)# = 0 and thus get not admissible. The sstep blocks with rule Let2:

Let<sup>2</sup> <sup>Σ</sup>2; <sup>Π</sup><sup>I</sup> - Q - =⇒ Σ<sup>2</sup> -<sup>⊥</sup> <sup>Q</sup> Cnd<sup>2</sup> <sup>Σ</sup>2; <sup>Π</sup><sup>I</sup> - <sup>P</sup>(false) - =⇒ Σ<sup>2</sup> -<sup>⊥</sup> <sup>Q</sup> Seq<sup>1</sup> <sup>Σ</sup>2; <sup>Π</sup><sup>I</sup> - <sup>P</sup>(false); pause; <sup>M</sup> - =⇒ Σ<sup>2</sup> -<sup>⊥</sup> <sup>Q</sup>; pause; <sup>M</sup> Let1(Σ1; <sup>Π</sup><sup>I</sup> -<sup>↓</sup> xs. pres) <sup>Σ</sup>1; <sup>Π</sup><sup>I</sup> v <sup>=</sup> xs. pres(); <sup>P</sup>(v); pause; <sup>M</sup> - =⇒ Σ<sup>2</sup> -<sup>⊥</sup> <sup>Q</sup>; pause; <sup>M</sup> Rec Σ1; Π<sup>I</sup> - <sup>M</sup> <sup>m</sup><sup>2</sup> ==<sup>⇒</sup> <sup>Σ</sup><sup>2</sup> -<sup>⊥</sup> <sup>Q</sup>; pause; <sup>M</sup> Par<sup>1</sup> <sup>Σ</sup>1; {} - <sup>M</sup> <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup> <sup>m</sup><sup>2</sup> ==<sup>⇒</sup> <sup>Σ</sup><sup>2</sup> -<sup>⊥</sup> (Q; pause; <sup>M</sup>) <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup> Let1(Σ; <sup>⊥</sup><sup>0</sup> - <sup>↓</sup> reg.init) Σ; {} - <sup>C</sup> <sup>m</sup>1m<sup>2</sup> ====<sup>⇒</sup> <sup>Σ</sup><sup>2</sup> -<sup>⊥</sup> (Q; pause; <sup>M</sup>) <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup>

where <sup>m</sup><sup>1</sup> <sup>=</sup> reg.init and <sup>m</sup><sup>2</sup> <sup>=</sup> xs. pres. In the next sstep, from <sup>Σ</sup>2; <sup>Π</sup>*<sup>Q</sup>* with <sup>Π</sup>*<sup>Q</sup>* <sup>=</sup> {} ⊗ *can*(Q; pause; <sup>M</sup>) = *can*(Q; pause; <sup>M</sup>) = {ws.get · reg.set} we let the process <sup>I</sup> execute its reg.get() with rules Rec and Let1. The return value is <sup>v</sup> <sup>=</sup> <sup>Σ</sup>2.reg.get() = 0. Then, from the updated memory <sup>Σ</sup><sup>3</sup> <sup>=</sup> <sup>Σ</sup><sup>2</sup> reg.get() we run the continuation process ws.put(0 + 1); pause; <sup>I</sup>. The ws.put is enabled if the IVar is empty and there is no concurrent put on ws predicted from <sup>M</sup>. Both conditions hold since (Σ3.ws)# = (Σ.ws)# = 0 and <sup>π</sup>ws(Π*Q*) = {get}. Therefore, [Σ3, Π*Q*] - <sup>↓</sup> ws.put. With the evaluation *eval*(0 + 1) = 1 the rule Let<sup>1</sup> permits us to update the memory as <sup>Σ</sup><sup>4</sup> <sup>=</sup> <sup>Σ</sup><sup>3</sup> ws.put(1) and continue with process pause; <sup>I</sup> which completes by pausing. Formally, this sstep is:


where <sup>m</sup><sup>3</sup> <sup>=</sup> reg.get and <sup>m</sup><sup>4</sup> <sup>=</sup> ws.put. In the next sstep the waiting method <sup>u</sup> <sup>=</sup> ws.get in <sup>Q</sup> is now admissible and can proceed, (Σ4.ws)# = ((Σ<sup>3</sup> ws.put(1)).ws)# = 1 and thus [Σ4, Π] - <sup>↓</sup> ws.get for all <sup>Π</sup>. The return value is <sup>u</sup> <sup>=</sup> <sup>Σ</sup>4.ws.get() = 1, the updated memory <sup>Σ</sup><sup>5</sup> <sup>=</sup> <sup>Σ</sup><sup>4</sup> ws.put(1) and the continuation process reg.set(1); pause; <sup>M</sup>. The register set method is admissible since (Σ4.reg)# = 1 and also enabled because there is no get predicted in the concurrent environment <sup>⊥</sup>0. Thus, [Σ5, <sup>⊥</sup>0] - <sup>↓</sup> reg.set. The execution of the method yields the memory <sup>Σ</sup><sup>6</sup> <sup>=</sup> <sup>Σ</sup><sup>5</sup> reg.set(1) with continuation process pause ; <sup>M</sup> which completes by pausing. This yields the derivation tree:

$$\begin{array}{c} \begin{array}{c} \hline \hline \Sigma\_{\mathsf{S}}; \{\epsilon\} \vdash \mathsf{pause}; M \stackrel{\scriptstyle}{\rightarrow} \Sigma\_{\mathsf{S}} \vdash\_{1} \mathsf{pause}; M \,\, M \\ \hline \Sigma\_{\mathsf{S}}; \{\epsilon\} \vdash \mathsf{regel}; \mathsf{set}(1); \mathsf{pause}; M \stackrel{\scriptstyle{mq}}{\Longrightarrow} \Sigma\_{\mathsf{S}} \vdash\_{1} \mathsf{pause}; M \,\, M \\ \hline \Sigma\_{\mathsf{A}}; \{\epsilon\} \vdash Q; \mathsf{pause}; M \stackrel{\scriptstyle{m5}}{\Longrightarrow} \Sigma\_{\mathsf{S}} \vdash\_{1} \mathsf{pause}; M \\ \hline \end{array} \mathsf{Let} \\ \begin{array}{c} \begin{array}{c} \Sigma\_{\mathsf{A}}; \{\epsilon\} \vdash \left(Q; \mathsf{pause}; M\right) \,\, \|\,\,\|\,\|\,\|\,\|\,\|\mathsf{pause}; M\right) \\ \end{array} \mathsf{Par}\_{2} \,\,\|\,\begin{array}{c} \begin{array}{c} \Sigma\_{\mathsf{S}} \vdash\_{1} \{\mathsf{pause}; M\} \,\, \|\,\|\,\|\,\|\,\|\,\|\mathsf{pause}; M\end{array} \\ \end{array} \mathsf{Par}\_{2} \,\,\|\,\|\,\|\,\|\,\|\,\|\mathsf{pause}; M \,\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\mathsf{pause}; M\| \end{array} \end{array} \begin{array}{c} \mathsf{Par}\_{2} \,\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,\|\,$$

where <sup>m</sup><sup>5</sup> <sup>=</sup> ws.get and <sup>m</sup><sup>6</sup> <sup>=</sup> reg.set. To justify the rule Par<sup>2</sup> consider that {} ⊗ *can*(pause; <sup>I</sup>) = {}⊗{} <sup>=</sup> {}. At this point we have reached a 1-stable process. With the tick function we advance to the next tick, <sup>σ</sup>((pause; <sup>M</sup>) <sup>1</sup>||<sup>1</sup> (pause; <sup>I</sup>)) = (skip; <sup>M</sup>) <sup>⊥</sup>||<sup>⊥</sup> (skip; <sup>I</sup>) which behaves like <sup>M</sup> <sup>⊥</sup>||<sup>⊥</sup> <sup>I</sup>. 

### **3.1 Determinacy, Termination and Constructiveness**

Determinacy of DCoL is a result of two components, monotonicity of policyconformant scheduling and csm coherence. Monotonicity ensures that whenever a method is executable and policy-enabled, then it remains policy-enabled under arbitrary ssteps of the environment. Symmetrically, the environment cannot be blocked by a thread taking policy-enabled computation steps.

The second building block for determinacy is csm variable coherence. Consider a context Σ; Π*<sup>Q</sup>* in which we run an sstep of P with prediction Π*<sup>Q</sup>* for concurrent process Q, resulting in a final memory Σ *<sup>P</sup>* arising from executing a sequence *m<sup>P</sup>* of method calls from P. Because of the policy constraint, the sequence *<sup>m</sup><sup>P</sup>* must be enabled under all predictions *<sup>n</sup>* <sup>∈</sup> <sup>Π</sup>*Q*, i.e., [Σ, *<sup>n</sup>*] - ↓ *m<sup>P</sup>* . Suppose, on the other side, we sstep the process Q in the same memory Σ with prediction Π*<sup>P</sup>* for P, resulting in an action sequence *m<sup>Q</sup>* and final memory Σ *Q*. Then, by the same reasoning, [Σ, *n*] - <sup>↓</sup> *<sup>m</sup><sup>Q</sup>* for all *<sup>n</sup>* <sup>∈</sup> <sup>Π</sup>*<sup>P</sup>* . But since *<sup>m</sup><sup>P</sup>* is an actual execution of <sup>P</sup> it must be in the prediction for <sup>P</sup>, i.e., *<sup>m</sup><sup>P</sup>* <sup>∈</sup> <sup>Π</sup>*<sup>P</sup>* and symmetrically, *<sup>m</sup><sup>Q</sup>* <sup>∈</sup> <sup>Π</sup>*Q*. But then we have [Σ, *<sup>m</sup>Q*] - <sup>↓</sup> *<sup>m</sup><sup>P</sup>* and [Σ, *<sup>m</sup><sup>P</sup>* ] - ↓ *m<sup>P</sup>* which means Σ *m<sup>P</sup> mQ*. Now if the semantics of method calls is policycoherent then the Monotonicity can be exploited to derive a confluence property for processes which guarantees that *m<sup>P</sup>* can still be executed by P in state Σ *Q* and *m<sup>Q</sup>* by Q in state Σ *<sup>P</sup>* , and both lead to the same final memory.

**Theorem 1 (Diamond Property).** *If all* csm *variables are policy-coherent then the sstep semantics is confluent. Formally, given two derivations* <sup>Σ</sup>; <sup>Π</sup> <sup>P</sup> *<sup>m</sup>*<sup>1</sup> <sup>=</sup><sup>⇒</sup> <sup>Σ</sup><sup>1</sup> *k*<sup>1</sup> <sup>P</sup><sup>1</sup> *and* <sup>Σ</sup>; <sup>Π</sup> <sup>P</sup> *<sup>m</sup>*<sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>Σ</sup><sup>2</sup> *k*<sup>2</sup> <sup>P</sup>2*, Then, there exist* <sup>Σ</sup> *,* k *and* P *such that* <sup>Σ</sup>1; <sup>Π</sup> <sup>P</sup><sup>1</sup> *<sup>n</sup>*<sup>1</sup> <sup>=</sup><sup>⇒</sup> <sup>Σ</sup> *k*- <sup>P</sup> *and* <sup>Σ</sup>1; <sup>Π</sup> <sup>P</sup><sup>2</sup> *<sup>n</sup>*<sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>Σ</sup> *k*- P *.*

Theorem 1 shows that no matter how we schedule the ssteps of local threads to create an sstep of a parallel composition, the final result will not diverge. This does not guarantee completion of a process. However, it implies that the question of whether P blocks or makes progress does not depend on the order in which concurrent threads are scheduled. Either a process completes or it does not. All ssteps in a process can be scheduled with maximal parallelism without interference.

A main program P is run at the top level in an "environmentally closed" form of ssteps (2) where the prediction is empty <sup>Π</sup> <sup>=</sup> {} and thus acts neutrally. We iterate such ssteps to construct a macro-step reaction. Let us write

$$
\Sigma \vdash P \Rightarrow \Sigma' \vdash P' \tag{3}
$$

if there exists k , *<sup>m</sup>* such that <sup>Σ</sup>; <sup>⊥</sup><sup>0</sup> <sup>P</sup> *<sup>m</sup>* <sup>=</sup><sup>⇒</sup> <sup>Σ</sup> *<sup>k</sup>*- P . The relation =⇒ is wellfounded for clock-guarded processes in the sense that it has no infinite chains.

**Theorem 2 (Termination).** *Let* P0, P1, P2,... *and* Σ0, Σ1, Σ2,... *be infinite sequences of processes and memories, respectively, with* <sup>Σ</sup>*<sup>i</sup>* <sup>P</sup>*<sup>i</sup>* <sup>=</sup><sup>⇒</sup> <sup>Σ</sup>*i*+1 <sup>P</sup>*i*+1*. If* <sup>P</sup><sup>0</sup> *is clock-guarded then there is* <sup>n</sup> <sup>≥</sup> <sup>0</sup> *with* <sup>Σ</sup>*<sup>n</sup>* <sup>=</sup> <sup>Σ</sup>*i,* <sup>P</sup>*<sup>n</sup>* <sup>=</sup> <sup>P</sup>*<sup>i</sup> for all* <sup>i</sup> <sup>≥</sup> <sup>n</sup>*.*

The fixed point semantics will iterate (3) until it reaches a P<sup>∗</sup> such that <sup>Σ</sup><sup>∗</sup> <sup>P</sup><sup>∗</sup> <sup>=</sup><sup>⇒</sup> <sup>Σ</sup><sup>∗</sup> <sup>P</sup>∗. By Termination Theorem <sup>2</sup> this must exist for clockguarded processes. If *can<sup>s</sup>*(P∗) = <sup>⊥</sup><sup>0</sup> then <sup>P</sup><sup>∗</sup> is 0-stable and the program <sup>P</sup> has terminated. If *can<sup>s</sup>*(P∗) = <sup>⊥</sup>1, the residual <sup>P</sup><sup>∗</sup> is pausing.

**Definition 6 (Macro Step).** *<sup>A</sup>* run <sup>Σ</sup> <sup>P</sup> ⇒⇒ <sup>Σ</sup> <sup>P</sup> *is a sequence of ssteps with processes* P0, P1, P2,...,P*<sup>n</sup> and sequences of method calls m*1*, m*2,... *m<sup>n</sup> such that for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*,*

$$
\Sigma\_{i-1}; \perp\_0 \vdash P\_{i-1} \Rightarrow \Sigma\_i \vdash\_{k\_i} P\_{i,1}
$$

*where* P<sup>0</sup> = P*,* Σ<sup>0</sup> = Σ*,* Σ*<sup>n</sup>* = Σ *and* P*<sup>n</sup>* = P *. A run is called a* macro-step *if it is maximal, i.e., if* <sup>Σ</sup> <sup>P</sup> <sup>=</sup><sup>⇒</sup> <sup>Σ</sup> <sup>P</sup> *implies* <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup> *and* <sup>P</sup> <sup>=</sup> <sup>P</sup>*. The macro-step is called* stabilising *if the final* <sup>P</sup> *is stable, i.e.,* <sup>k</sup>*<sup>n</sup>* <sup>=</sup> <sup>⊥</sup> *and the clock is admissible, i.e., if* (Σ .*c*)# <sup>σ</sup> *is defined for all <sup>c</sup>* <sup>∈</sup> *<sup>O</sup>. The macro-step is* pausing *if* <sup>k</sup>*<sup>n</sup>* = 1 *and* terminating *if* <sup>k</sup>*<sup>n</sup>* = 0*.* 

Given a pausing macro-step <sup>Σ</sup> <sup>P</sup> ⇒⇒ <sup>Σ</sup> <sup>P</sup> , then the next tick starts with process σ(P ) in memory <sup>Σ</sup> such that (πc(Σ ))# <sup>−</sup>σ<sup>→</sup> (πc(Σ))# for all c <sup>∈</sup> <sup>O</sup>. This only constrains the abstract policy state of each variable in <sup>Σ</sup> not their memory content. In this way, csm variables can introduce an arbitrary new memory Σ with every clock tick.

**Theorem 3 (Macro-step Determinism).** *If all* csm *variables are policycoherent then for two macro steps* <sup>Σ</sup> <sup>P</sup> ⇒⇒ <sup>Σ</sup><sup>1</sup> <sup>P</sup><sup>1</sup> *and* <sup>Σ</sup> <sup>P</sup> ⇒⇒ <sup>Σ</sup><sup>2</sup> <sup>P</sup><sup>2</sup> *we have* Σ<sup>1</sup> = Σ<sup>2</sup> *and* P<sup>1</sup> = P2*.*

**Definition 7 (Constructiveness).** *A program* P *is* policy-constructive*, for a set of policy coherent* csm *variables, if for arbitrary initial memory* Σ *all reachable macro-steps of* <sup>P</sup> *are stabilising.* 

A non-constructive program will, after some tick, end up in a fixed point <sup>P</sup><sup>∗</sup> with *can<sup>s</sup>*(P∗) ∈ {⊥0, <sup>⊥</sup>1}. Then <sup>P</sup><sup>∗</sup> is stuck involving a set of active child threads waiting for each other in a policy-induced cycle.

Finally, we present two important results for DCoL showing that we are conservatively extending existing SP semantics. A DCoL program using only sequentially constructive variables [14] (see [17] Sec. 5.7]) is called a *DCoL-SC* program. DCoL programs using only pure signals subject to the policy of Example 1 (Fig. 1) are expressive complete for the pure instantaneous fragment of Esterel [4]. Esterel signal emissions emit s are syntactic sugar for s. emit();. A presence test pres s then <sup>P</sup> else <sup>Q</sup> abbreviates if s. pres() then <sup>P</sup> else <sup>Q</sup>. Sequential composition <sup>P</sup> ; <sup>Q</sup> in Esterel behaves like a parallel composition in which the schedule is forced to run P to termination before it can pass control to <sup>Q</sup>. In DCoL this is (P;s . emit();) || (s . pres() then <sup>Q</sup> else skip) with fresh signal s not occurring in either <sup>P</sup> or <sup>Q</sup>. This suggests the following definition: A program P is a *(pure instantaneous) DCoL-Esterel* program if (i) P only uses pure signals and (ii) <sup>P</sup> does not use pause or rec and (iii) <sup>P</sup> does not contain sequentially nested occurrences of signal accesses.

### **Theorem 4 (Esterel and Sequential Constructiveness)**


It is interesting to note that the second statement in Theorem 4 is not invertible (for a counter example see [17]). Hence, policy-constructiveness for SC-variables induced by our semantics is more restrictive than that given in [14].

### **4 Related Work**

Many languages have been proposed to offer determinism as a fundamental design principle. We consider these attempts under several categories.

*Fixed Protocol for Shared Data.* These approaches introduce an unique protocol for data exchange between concurrent processes. SHIM [21] provides a model for combined hardware software systems typically of embedded systems. Here, the concurrent processes communicate using point-to-point (restricted) Kahn channels with blocking reads and writes. SHIM programs are shown to be deterministic-by-construction as the states of each process are finite and deterministic and the data produced-consumed over any channel is also deterministic.

Concurrent revisions [19] introduce a generic and deterministic programming model for parallel programming. This model supports fork-join parallelism and processes are allowed to make concurrent modifications to shared data by creating local copies that are eventually merged using suitable (programmer specified) merge functions at join boundaries.

However, like the deterministic SP model [2], both SHIM and concurrent revisions lack support for more expressive shared ADTs essential for programming complex systems. Caromel et al. [22], on the other hand, offer determinism with asynchronously communicating active objects (ADTs) equipped with a process calculus semantics. Here, an active object is a sequential thread. Active objects communicate using *futures* and synchronise via Kahn-MacQueen co-routines [23] for deterministic data exchange. Our approach subsumes Kahn buffers of SHIM and the *local-copy-merge protocol* of concurrent revisions by an appropriate choice of method interface and policy. None of these approaches [19,21,22] uses a clock as a central barrier mechanism like our approach does.

In the Java-derived language X10, clocks are a form of synchronisation barrier for supporting deterministic and deadlock-free patterns of common parallel computations [24]. This allows multiple-clocks in contrast to our approach. These, however, are not abstracted in the objects in contrast to our clocks that are encapsulated inside the csm types. Hence X10 clocks are invoked directly by the *activities* (i.e., concurrent threads) of programs and this manual synchronisation is as error-prone as other unsafe low-level primitives such as locks.

*Coherent Memory Models for Shared Data.* Whether clocked or not, our approach depends on the availability of csm types that are provably coherent for their policy. Besides the standard types of SP (data-flow, sequentially constructive variables, Kahn channels, signals) such csm types can be obtained from existing research on *coherent memory models* [25,26]. Unlike the protocol-oriented approaches above, some approaches have been developed based on coherency of the underlying memory models [26] especially for shared objects.

Bocchino et al. [25] propose deterministic parallel Java (DPJ) which has a type and effect system to ensure that parallel heap accesses remain safe. Data structures such as arrays, trees, and sets can be accessed in parallel as long as accesses can be shown to use non-overlapping regions.

Grace [27] promises a deterministic run-time through the adoption of *fork-join* parallelism combined with memory protection and a sequential commit protocol. However, there is no guarantee on the determinism of such custom synchronisation protocols. These must be verified using expensive proof systems.

A powerful technique to generate coherent shared memory structure for functional programs has recently been proposed by Kuper et al. [28]. They introduce lattice-based data structures, called LVars, in which all write accesses produce a monotonic value increase in the lattice and all read accesses are blocked until the memory value has passed a read-specific threshold. Each variable's domain is organised as a lattice of states with ⊥ and representing an empty new location and an error, respectively. Because of monotonicity all writes are confluent with each other. Since reads are blocked each LVar data type can thus be used in DCoL as a coherent csm type of variables with a threshold-determined policy. Note that [25–28] do not consider csm types and [28] also do not treat destructive sequential updates as we do.

Recently Haller et al. [29] have developed Reactive Async, a new event-based asynchronous concurrent programming model that improves on LVars. This approach extends futures and promises<sup>8</sup> with lattice-based operations in order to support destructive updates (refinement of results) in a deterministic concurrent setting. The basic abstractions are: *cells* which define interfaces for reading a value that is asynchronously computed and (ii) *cell completers* that allow multiple monotonic updates of values taken from a lattice type class. The model supports concurrent programming with cyclic data dependencies in contrast to LVars. The mechanism for resolving cycles combines the lattices with quiescence detection on a handler pool (execution context). The quiescence concept refers to a state where the cell values are not going to be changed anymore. The thread pool is able to detect this quiescent (synchronisation) phase and when this is the case the resolution of cyclic dependencies and reading of cells can take place. This is similar to our policies, where enabling of methods (e. g., read) is a state and prediction-dependent notion. Our developments may offer a theoretical background for the cell interfaces of this model. In Reactive Async the concurrent code is guaranteed to be deterministic provided that the API is used appropriately but this is not checked statically. It would be interesting to investigate whether our theory can contribute on this front. In the other direction, Reactive Async manages inter-cell dependencies which might support global policies between different csm variables in our setting.

*Clock-Driven Encapsulation.* Encapsulation is not entirely unknown in reactive programming. The idea of *reactive object model (ROM)* [30] was first introduced by Boussinot et al. and subsequently refined [31] and combined with standards such as UML [32]. Here a program is a collection of reactive objects that operate synchronously relative to a global clock, similar to SP. Each object encapsulates a set of methods and data, where the methods share this data. ROM relied on a simplified assumption, where each method invocation is separated into instants.

Andr´e et al. [33] generalised the ROM idea to that of *synchronous objects*, which behave like synchronous modules (in Esterel or Lustre). The program is divided into a collection of synchronous and standard objects. While the latter interact using messages, the former use *signals* like in SP. Communication between standard and synchronous objects has to be managed using special *interface objects*. The framework supports features such as aggregation, encapsulation and inheritance yet communication is restricted to standard Esterel-style signals. However, the issue of determinism for the composition of synchronous objects with standard objects is not considered.

<sup>8</sup> A future can asynchronously be completed with a value of the appropriate type or it can fail with an exception. A promise allows completing a future at most once.

A concrete implementation of synchronous objects in Java is proposed in [34]. Here, a run-time system is used to provide a cyclic schedule of the objects during an instant. This approach assumes that outputs from the objects can be read only in the next instant (similar to the SL programming language [35]) and so does not support instantaneous communication like we do.

Synchronous objects arise naturally in modular compilation [15,36,37]. The first time these have been exposed at the language level is in [20]. That work has inspired our use of policies. While [20] offers a mechanism for deterministic management of shared variables through ADT-like interfaces it has three serious limitations: (1) Modes express data-flow equations rather than imperative method procedures and so are not directly suitable for control-flow programming; (2) Policies do not distinguish between two modes being called *sequentially* by the *same* thread, which can be permitted, and two methods being called by *different* threads in *parallel*, which may have to be prohibited. This makes policies too restrictive in the light of the recent more liberal notion of sequential constructiveness [14] and, most importantly, (3) the notion of policy-soundness does not use policies *prescriptively* as a contract to be fulfilled by the scheduler but instead only *descriptively* as an invariant of the program code. Hence, policies in [20] cannot be used to generalise the semantics of SP signals to shared ADTs.

The sequentially constructive model of synchronous computation [14] has shown how the constructive semantics of Esterel can be reconstructed from a scheduling view as standard destructive variables plus synchronisation protocol. SCL acts as an intermediate language for the graphical language SCCharts [38] and the textual language SCEst [18] which are proposed as sequentially constructive extensions of the well-known control-flow languages SyncCharts [39] and Esterel [4]. By presenting our new analysis of sequential constructiveness for SCL our results become applicable both for SCCharts and SCEst.

The term 'constructive' semantics has been coined by Berry [4]. In [40] it was shown how it can be recoded as a fixed-point in an interval domain which we generalise here to policy states [μ, γ]. Talpin et al. [13] recently gave a constructive semantics of multi-clock synchronous programs. It is an open problem how our approach could be generalised to multiple clocks.

### **5 Conclusion**

This work extends the SP theoretical foundations to allow communication at higher levels of abstraction. The paper explains deterministic concurrency of SP as a derived property from csm types. Our results extend the SP-notion of constructiveness to general shared csm types. We have made some simplifying assumptions that render the theory somewhat less general than it could be. A first limitation is our assumption that all method calls are atomic. We believe the theory can be generalised for non-atomic methods albeit at the price of a significant increase in the complexity of calculating *can* predictions. Second, method parameters are passed "by value" rather than "by reference". This is necessary for having types as black boxes ready to use. Method parameters passing variables "by reference" would also introduce aliasing issues which we do not address. Third, in our present setting the policy update <sup>μ</sup> <sup>m</sup> does not observe method parameters. This is an abstraction to facilitate static analyses. In principle, to increase expressiveness, the method parameters could be included, too, but again complicate over-approximation for *can* information.

**Acknowledgement.** We thank Philipp Haller, Adrien Guatto and the three anonymous reviewers for their insightful comments and suggestions helping us improving the paper. This work has been supported by the German Research Council (DFG) under grant number ME-1427/6-2.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Probabilistic Programming

## **An Assertion-Based Program Logic for Probabilistic Programs**

Gilles Barthe<sup>1</sup>, Thomas Espitau<sup>2</sup>, Marco Gaboardi<sup>3</sup>, Benjamin Gr´egoire<sup>4</sup>, Justin Hsu5(B), and Pierre-Yves Strub<sup>6</sup>

> IMDEA Software Institute, Madrid, Spain Universit´e Paris 6, Paris, France University at Buffalo, SUNY, Buffalo, USA Inria Sophia Antipolis–M´editerran´ee, Nice, France

<sup>5</sup> University College London, London, UK

<sup>6</sup> Ecole Polytechnique, Palaiseau, France ´

**Abstract.** We present Ellora, a sound and relatively complete assertion-based program logic, and demonstrate its expressivity by verifying several classical examples of randomized algorithms using an implementation in the EasyCrypt proof assistant. Ellora features new proof rules for loops and adversarial code, and supports richer assertions than existing program logics. We also show that Ellora allows convenient reasoning about complex probabilistic concepts by developing a new program logic for probabilistic independence and distribution law, and then smoothly embedding it into Ellora.

### **1 Introduction**

The most mature systems for deductive verification of randomized algorithms are *expectation-based* techniques; seminal examples include PPDL [28] and pGCL [34]. These approaches reason about *expectations*, functions E from states to real numbers,<sup>1</sup> propagating them backwards through a program until they are transformed into a mathematical function of the input. Expectation-based systems are both theoretically elegant [16,23,24,35] and practically useful; implementations have verified numerous randomized algorithms [19,21]. However, properties involving multiple probabilities or expected values can be cumbersome to verify—each expectation must be analyzed separately.

An alternative approach envisioned by Ramshaw [37] is to work with predicates over distributions. A direct comparison with expectation-based techniques

This is the conference version of the paper.

<sup>1</sup> Treating a program as a function from input states s to output distributions μ(s), the expected value of E on μ(s) is an expectation.

**Electronic supplementary material** The online version of this chapter (https:// doi.org/10.1007/978-3-319-89884-1 5) contains supplementary material, which is available to authorized users.

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 117–144, 2018. https://doi.org/10.1007/978-3-319-89884-1\_5

is difficult, as the approaches are quite different. In broad strokes, assertionbased systems can verify richer properties in one shot and have specifications that are arguably more intuitive, especially for reasoning about loops, while expectation-based approaches can transform expectations mechanically and can reason about non-determinism. However, the comparison is not very meaningful for an even simpler reason: existing assertion-based systems such as [8,18,38] are not as well developed as their expectation-based counterparts.


These limitations raise two points. Compared to expectation-based approaches:


In this paper, we give positive evidence for both of these points.<sup>2</sup> Towards the first point, we give a new assertion-based logic Ellora for probabilistic programs, overcoming limitations in existing probabilistic program logics. Ellora supports a rich set of assertions that can express concepts like expected values and probabilistic independence, and novel proof rules for verifying loops and adversarial code. We prove that Ellora is sound and relatively complete.

Towards the second point, we evaluate Ellora in two ways. First, we define a new logic for proving probabilistic independence and distribution law

<sup>2</sup> Note that we do not give mathematically precise formulations of these points; as we are interested in the practical verification of probabilistic programs, a purely theoretical answer would not address our concerns.

properties—which are difficult to capture with expectation-based approaches and then embed it into Ellora. This sub-logic is more narrowly focused than Ellora, but supports more concise reasoning for the target assertions. Our embedding demonstrates that the assertion-based approach can be flexibly integrated with intuitive, special-purpose reasoning principles. To further support this claim, we also provide an embedding of the Union Bound logic, a program logic for reasoning about accuracy bounds [4]. Then, we develop a full-featured implementation of Ellora in the EasyCrypt theorem prover and exercise the logic by mechanically verifying a series of complex randomized algorithms. Our results suggest that the assertion-based approach can indeed be practically viable.

*Abstract Logic.* To ease the presentation, we present Ellora in two stages. First, we consider an abstract version of the logic where assertions are general predicates over distributions, with no compact syntax. Our abstract logic makes two contributions: reasoning for loops, and for adversarial code.

*Reasoning About Loops.* Proving a property of a probabilistic loop typically requires establishing a loop invariant, but the class of loop invariants that can be soundly used depends on the termination behavior—stronger termination assumptions allows richer loop invariants. We identify three classes of assertions that can be used for reasoning about probabilistic loops, and provide a proof rule for each one:


The definition of topologically closed assertion is reminiscent of Ramshaw [37]; the stronger notion of downwards closed assertion appears to be new.

Besides broadening the class of loops that can be analyzed, our rules often enable simpler proofs. For instance, if the loop is certainly terminating, then there is no need to prove semantic side-conditions. Likewise, there is no need to consider the termination behavior of the loop when the invariant is downwards and topologically closed. For example, in many applications in cryptography, the target property is that a "bad" event has low probability: Pr [E] ≤ k. In our framework this assertion is downwards and topologically closed, so it can be a loop invariant regardless of the termination behavior.

*Reasoning About Adversaries.* Existing assertion-based logics cannot reason about probabilistic programs with *adversarial* code. *Adversaries* are special probabilistic procedures consisting of an interface listing the concrete procedures that an adversary can call (*oracles*), along with restrictions like how many calls an adversary may make. Adversaries are useful in cryptography, where security notions are described using experiments in which adversaries interact with a challenger, and in game theory and mechanism design, where adversaries can represent strategic agents. Adversaries can also model inputs to *online* algorithms.

We provide proof rules for reasoning about adversary calls. Our rules are significantly more general than previously considered rules for reasoning about adversaries. For instance, the rule for adversary used by [4] is restricted to adversaries that cannot make oracle calls.

*Metatheory.* We show soundness and relative completeness of the core abstract logic, with mechanized proofs in the Coq proof assistant.

*Concrete Logic.* While the abstract logic is conceptually clean, it is inconvenient for practical formal verification—the assertions are too general and the rules involve semantic side-conditions. To address these issues, we flesh out a concrete version of Ellora. Assertions are described by a grammar modeling a two-level assertion language. The first level contains state predicates deterministic assertions about a single memory—while the second layer contains probabilistic predicates constructed from probabilities and expected values over discrete distributions. While the concrete assertions are theoretically less expressive than their counterparts in the abstract logic, they can already encode common properties and notions from existing proofs, like probabilities, expected values, distribution laws and probabilistic independence. Our assertions can express theorems from probability theory, enabling sophisticated reasoning about probabilistic concepts.

Furthermore, we leverage the concrete syntax to simplify verification.


*Implementation and Case Studies.* We implement Ellora on top of Easy-Crypt, a general-purpose proof assistant for reasoning about probabilistic programs, and we mechanically verify a diverse collection of examples including textbook algorithms and a randomized routing procedure. We develop an Easy-Crypt formalization of probability theory from the ground up, including tools like concentration bounds (e.g., the Chernoff bound), Markov's inequality, and theorems about probabilistic independence.

*Embeddings.* We propose a simple program logic for proving *probabilistic independence*. This logic is designed to reason about independence in a lightweight way, as is common in paper proofs. We prove that the logic can be embedded into Ellora, and is therefore sound. Furthermore, we prove an embedding of the Union Bound logic [4].

### **2 Mathematical Preliminaries**

As is standard, we will model randomized computations using *sub-distributions*.

**Definition 1.** *A* sub-distribution *over a set* A *is defined by a mass function* μ : A → [0, 1] *that gives the probability of the unitary events* a ∈ A*. This mass function must be s.t.* - a∈A <sup>μ</sup>(a) *is well-defined and* <sup>|</sup>μ<sup>|</sup> - = - a∈A <sup>μ</sup>(a) <sup>≤</sup> <sup>1</sup>*. In particular, the* support supp(μ) - <sup>=</sup> {<sup>a</sup> <sup>∈</sup> <sup>A</sup> <sup>|</sup> <sup>μ</sup>(a) = 0} *is discrete.*<sup>3</sup> *The name "sub-distribution" emphasizes that the total probability may be strictly less than* 1*. When the* weight |μ| *is equal to* 1*, we call* μ *a* distribution*. We let* **SDist**(A) *denote the set of sub-distributions over* A*. The probability of an event* E(x) *w.r.t. a sub-distribution* <sup>μ</sup>*, written* Prx∼μ[E(x)]*, is defined as* - x∈A|E(x) <sup>μ</sup>(x)*.*

Simple examples of sub-distributions include the *null sub-distribution* **0**, which maps each element of the underlying space to 0; and the *Dirac distribution centered on* x, written δx, which maps <sup>x</sup> to 1 and all other elements to 0. The following standard construction gives a monadic structure to sub-distributions.

**Definition 2.** *Let* <sup>μ</sup> <sup>∈</sup> **SDist**(A) *and* <sup>f</sup> : <sup>A</sup> <sup>→</sup> **SDist**(B)*. Then* <sup>E</sup>a∼μ[f] <sup>∈</sup> **SDist**(B) *is defined by*

$$\mathbb{E}\_{a \sim \mu}[f](b) \overset{\triangle}{=} \sum\_{a \in A} \mu(a) \cdot f(a)(b).$$

*We use notation reminiscent of expected values, as the definition is quite similar.*

We will need two constructions to model branching statements.

**Definition 3.** *Let* μ1, μ<sup>2</sup> ∈ **SDist**(A) *such that* |μ1| + |μ2| ≤ 1*. Then* μ<sup>1</sup> + μ<sup>2</sup> *is the sub-distribution* μ *such that* μ(a) = μ1(a) + μ2(a) *for every* a ∈ A*.*

**Definition 4.** *Let* <sup>E</sup> <sup>⊆</sup> <sup>A</sup> *and* <sup>μ</sup> <sup>∈</sup> **SDist**(A)*. Then the restriction* <sup>μ</sup><sup>|</sup>E *of* <sup>μ</sup> *to* <sup>E</sup> *is the sub-distribution such that* <sup>μ</sup><sup>|</sup>E(a) = <sup>μ</sup>(a) *if* <sup>a</sup> <sup>∈</sup> <sup>E</sup> *and 0 otherwise.*

Sub-distributions are partially ordered under the pointwise order.

**Definition 5.** *Let* μ1, μ<sup>2</sup> ∈ **SDist**(A)*. We say* μ<sup>1</sup> ≤ μ<sup>2</sup> *if* μ1(a) ≤ μ2(a) *for every* a ∈ A*, and we say* μ<sup>1</sup> = μ<sup>2</sup> *if* μ1(a) = μ2(a) *for every* a ∈ A*.*

We use the following lemma when reasoning about the semantics of loops.

**Lemma 1.** *If* μ<sup>1</sup> ≤ μ<sup>2</sup> *and* |μ1| = 1*, then* μ<sup>1</sup> = μ<sup>2</sup> *and* |μ2| = 1*.*

Sub-distributions are stable under pointwise-limits.

<sup>3</sup> We work with discrete distributions to keep measure-theoretic technicalities to a minimum, though we do not see obstacles to generalizing to the continuous setting.

**Definition 6.** *A sequence* (μn)n∈<sup>N</sup> <sup>∈</sup> **SDist**(A) *sub-distributions* converges *if for every* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*, the sequence* (μn(a))n∈<sup>N</sup> *of real numbers converges. The* limit sub-distribution *is defined as*

$$\mu\_{\infty}(a) \stackrel{\triangle}{=} \lim\_{n \to \infty} \mu\_n(a).$$

*for every* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*. We write* limn→∞ <sup>μ</sup>n *for* <sup>μ</sup>∞*.*

**Lemma 2.** *Let* (μn)n∈<sup>N</sup> *be a convergent sequence of sub-distributions. Then for any event* E(x)*, we have:*

$$\forall n \in \mathbb{N}. \Pr\_{x \sim \mu\_{\infty}}[E(x)] = \lim\_{n \to \infty} \Pr\_{x \sim \mu\_{n}}[E(x)].$$

Any bounded increasing real sequence has a limit; the same is true of subdistributions.

**Lemma 3.** *Let* (μn)n∈<sup>N</sup> <sup>∈</sup> **SDist**(A) *be an increasing sequence of subdistributions. Then, this sequence converges to* <sup>μ</sup><sup>∞</sup> *and* <sup>μ</sup>n <sup>≤</sup> <sup>μ</sup><sup>∞</sup> *for every* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*. In particular, for any event* <sup>E</sup>*, we have* Prx∼μ*<sup>n</sup>* [E] <sup>≤</sup> Prx∼μ<sup>∞</sup>[E] *for every* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.*

### **3 Programs and Assertions**

Now, we introduce our core programming language and its denotational semantics.

*Programs.* We base our development on pWhile, a strongly-typed imperative language with deterministic assignments, probabilistic assignments, conditionals, loops, and an **abort** statement which halts the computation with no result. Probabilistic assignments <sup>x</sup> <sup>←</sup>\$ <sup>g</sup> assign a value sampled from a distribution <sup>g</sup> to a program variable x. The syntax of statements is defined by the grammar:

$$\begin{aligned} s &::= \text{skip} \mid \text{abort} \mid x \leftarrow e \mid x \xleftarrow{s} g \mid s; s\\ &\mid \text{if } e \text{ then } s \text{ else } s \mid \text{while } e \text{ do } s \mid x \leftarrow \mathcal{I}(e) \mid x \leftarrow \mathcal{A}(e) \end{aligned}$$

where x, e, and g range over typed variables in X , expressions in E and distribution expressions in D respectively. The set E of well-typed expressions is defined inductively from X and a set F of function symbols, while the set D of well-typed distribution expressions is defined by combining a set of distribution symbols S with expressions in E. Programs may call a set I of internal procedures as well as a set A of external procedures. We assume that we have code for internal procedures but not for external procedures—we only know indirect information, like which internal procedures they may call. Borrowing a convention from cryptography, we call internal procedures *oracles* and external procedures *adversaries*.

*Semantics.* The denotational semantics of programs is adapted from the seminal work of [27] and interprets programs as sub-distribution transformers. We view

$$\begin{aligned} \left[\mathbf{skip}\right]\_m &= \delta\_m\\ \left[\mathbf{abort}\right]\_m &= \mathbf{0} \\ \left[x \leftarrow e\right]\_m &= \delta\_{m\left[x \leftarrow \left[e\right]\_m\right]}\\ \left[x \not\succeq g\right]\_m &= \mathbf{E}\_{v \sim \left[g\right]\_m} [\delta\_{m\left[x \coloneqq v\right]}]\\ \left[s\_1 ; s\_2\right]\_m &= \mathbf{E}\_{m' \sim \left[s\_1\right]\_m} [\left[s\_2\right]\_{m'}]\\ \left[\mathbf{if}\ e \text{ then } s\_1 \text{ else } s\_2\right]\_m &= \mathbf{if}\left[e\right]\_m \text{ then } [s\_1]\_m \text{ else } [s\_2]\_m\\ \left[\mathbf{while}\ e \text{ do } s\right]\_m &= \lim\_{n \to \infty} \left[\left(\mathbf{if}\ e \text{ then } s\right)^n; \text{if } e \text{ then } \mathbf{abort}\right]\_m\\ \left[x \leftarrow \mathcal{L}(e)\right]\_m &= \left[f\text{arg} \leftarrow e; f\_{\text{body}}; x \leftarrow f\_{\text{res}}\right]\_m\\ \left[x \leftarrow \mathcal{A}(e)\right]\_m &= \left[a\_{\text{arg}} \leftarrow e; a\_{\text{body}}; x \leftarrow a\_{\text{res}}\right]\_m \end{aligned}$$

states as type-preserving mappings from variables to values; we write **State** for the set of states and **SDist**(**State**) for the set of probabilistic states. For each procedure name <sup>f</sup> ∈I∪A, we assume a set <sup>X</sup> <sup>L</sup> f ⊆ X of *local variables* s.t. <sup>X</sup> <sup>L</sup> f are pairwise disjoint. The other variables X \ f <sup>X</sup> <sup>L</sup> f are *global variables*.

To define the interpretation of expressions and distribution expressions, we let <sup>e</sup>m denote the interpretation of expression <sup>e</sup> with respect to state <sup>m</sup>, and <sup>e</sup>μ denote the interpretation of expression <sup>e</sup> with respect to an initial subdistribution μ over states defined by the clause eμ - <sup>=</sup> <sup>E</sup>m∼μ[<sup>e</sup>m]. Likewise, we define the semantics of commands in two stages: first interpreted in a single input memory, then interpreted in an input sub-distribution over memories.

**Definition 7.** *The semantics of commands are given in Fig. 1.*


We briefly comment on loops. The semantics of a loop **while** e **do** c is defined as the limit of its lower approximations, where the n-th *lower approximation* of **while** <sup>e</sup> **do** <sup>c</sup>μ is -(**if** <sup>e</sup> **then** <sup>s</sup>)n; **if** <sup>e</sup> **then abort**μ, where **if** <sup>e</sup> **then** <sup>s</sup> is shorthand for **if** e **then** s **else skip** and c<sup>n</sup> is the n-fold composition c; ··· ; c. Since the sequence is increasing, the limit is well-defined by Lemma 3. In contrast, the n-th *approximation* of **while** <sup>e</sup> **do** <sup>c</sup>μ defined by -(**if** <sup>e</sup> **then** <sup>s</sup>)nμ may not converge, since they are not necessarily increasing. However, in the special case where the output distribution has weight 1, the n-th lower approximations and the n-th approximations have the same limit.

**Lemma 4.** *If the sub-distribution* **while** <sup>e</sup> **do** <sup>c</sup>μ *has weight* <sup>1</sup>*, then the limit of* -(**if** <sup>e</sup> **then** <sup>s</sup>)nμ *is defined and*

$$\lim\_{n \to \infty} \| (\text{if } e \text{ then } s)^n ; \text{if } e \text{ then } \mathbf{abort} \|\_{\mu} = \lim\_{n \to \infty} \| (\text{if } e \text{ then } s)^n \|\_{\mu}.$$

This follows by Lemma 1, since lower approximations are below approximations so the limit of their weights (and the weight of their limit) is 1. It will be useful to identify programs that terminate with probability 1.

**Definition 8 (Lossless).** *A statement* s *is* lossless *if for every sub-distribution* μ*,* |<sup>s</sup>μ<sup>|</sup> <sup>=</sup> <sup>|</sup>μ|*, where* <sup>|</sup>μ<sup>|</sup> *is the total probability of* <sup>μ</sup>*. Programs that are not lossless are called* lossy.

Informally, a program is lossless if all probabilistic assignments sample from full distributions rather than sub-distributions, there are no **abort** instructions, and the program is almost surely terminating, i.e. infinite traces have probability zero. Note that if we restrict the language to sample from full distributions, then losslessness coincides with almost sure termination.

Another important class of loops are loops with a uniform upper bound on the number of iterations. Formally, we say that a loop **while** e **do** s is *certainly terminating* if there exists k such that for every sub-distribution μ, we have |**while** <sup>e</sup> **do** <sup>s</sup>μ<sup>|</sup> <sup>=</sup> <sup>|</sup>-(**if** <sup>e</sup> **then** <sup>s</sup>)kμ|. Note that certain termination of a loop does not entail losslessness—the output distribution of the loop may not have weight 1, for instance, if the loop samples from a sub-distribution or if the loop aborts with positive probability.

*Semantics of Procedure Calls and Adversaries.* The semantics of internal procedure calls is straightforward. Associated to each procedure name f ∈ I, we assume a designated input variable <sup>f</sup>**arg** ∈ X <sup>L</sup> f , a piece of code <sup>f</sup>**body** that executes the function call, and a result expression f**res**. A function call x ← I(e) is then equivalent to f**arg** ← e; f**body**; x ← f**res**. Procedures are subject to wellformedness criteria: procedures should only use local variables in their scope and after initializing them, and should not perform recursive calls.

External procedure calls, also known as adversary calls, are a bit more involved. Each name a ∈ A is parametrized by a set a**ocl** ⊆ I of internal procedures which the adversary may call, a designated input variable <sup>a</sup>**arg** ∈ X <sup>L</sup> a , a (unspecified) piece of code a**body** that executes the function call, and a result expression a**res**. We assume that adversarial code can only access its local variables in <sup>X</sup> <sup>L</sup> a and can only make calls to procedures in <sup>a</sup>**ocl**. It is possible to impose more restrictions on adversaries—say, that they are lossless—but for simplicity we do not impose additional assumptions on adversaries here.

### **4 Proof System**

In this section we introduce a program logic for proving properties of probabilistic programs. The logic is abstract—assertions are arbitrary predicates on sub-distributions—but the meta-theoretic properties are clearest in this setting. In the following section, we will give a concrete version suitable for practical use. *Assertions and Closedness Conditions.* We use predicates on state distribution.

**Definition 9 (Assertions).** *The set* Assn *of assertions is defined as* P(**SDist**(**State**))*. We write* η(μ) *for* μ ∈ η*.*

Usual set operations are lifted to assertions using their logical counterparts, e.g., η ∧ η - = η ∩ η and ¬η - = η. Our program logic uses a few additional constructions. Given a predicate φ over states, we define

$$
\Box \phi(\mu) \triangleq \forall m. m \in \text{supp}(\mu) \implies \phi(m).
$$

where supp(μ) is the set of all states with non-zero probability under μ. Intuitively, φ holds deterministically on all states that we may sample from the distribution. To reason about branching commands, given two assertions η<sup>1</sup> and η2, we let

$$(\eta\_1 \oplus \eta\_2)(\mu) \xleftarrow{\star} \exists \mu\_1, \mu\_2 \; . \; \mu = \mu\_1 + \mu\_2 \land \eta\_1(\mu\_1) \land \eta\_2(\mu\_2) \; .$$

This assertion means that the sub-distribution is the sum of two subdistributions such that η<sup>1</sup> holds on the first piece and η<sup>2</sup> holds on the second piece.

Given an assertion <sup>η</sup> and an event <sup>E</sup> <sup>⊆</sup> **State**, we let <sup>η</sup><sup>|</sup>E(μ) - <sup>=</sup> <sup>η</sup>(μ<sup>|</sup>E). This assertion holds exactly when η is true on the portion of the sub-distribution satisfying E. Finally, given an assertion η and a function F from **SDist**(**State**) to **SDist**(**State**), we define η[F] - = λμ. η(F(μ)). Intuitively, η[F] is true in a sub-distribution μ exactly when η holds on F(μ).

Now, we can define the closedness properties of assertions. These properties will be critical to our rules for **while** loops.

### **Definition 10 (Closedness properties).** *A family of assertions* (ηn)n∈N<sup>∞</sup> *is:*


*When* (ηn)n *is constant and equal to* <sup>η</sup>*, we say that* <sup>η</sup> *is* <sup>u</sup>*-/*t*-/*d*-closed.*

Note that t-closedness implies u-closedness, but the converse does not hold. Moreover, u-closed, t-closed and d-closed assertions are closed under arbitrary intersections and finite unions, or in logical terms under finite boolean combinations, universal quantification over arbitrary sets and existential quantification over finite sets.

Finally, we introduce the necessary machinery for the frame rule. The set mod(s) of *modified* variables of a statement s consists of all the variables on the left of a deterministic or probabilistic assignment. In this setting, we say that an assertion η is *separated* from a set of variables X, written separated(η,X), if <sup>η</sup>(μ1) ⇐⇒ <sup>η</sup>(μ2) for any distributions <sup>μ</sup>1, <sup>μ</sup><sup>2</sup> s.t. <sup>|</sup>μ1<sup>|</sup> <sup>=</sup> <sup>|</sup>μ2<sup>|</sup> and <sup>μ</sup><sup>1</sup>|X <sup>=</sup> <sup>μ</sup><sup>2</sup>|X where for a set of variables <sup>X</sup>, the restricted sub-distribution <sup>μ</sup>|X is

$$\mu\_{|X} : m \in \mathbf{State}\_{|X} \mapsto \Pr\_{m' \sim \mu}[m = m'\_{|X}],$$

where **State**|X and <sup>m</sup>|X restrict **State** and <sup>m</sup> to the variables in <sup>X</sup>.

Intuitively, an assertion is separated from a set of variables X if every two sub-distributions that agree on the variables outside X either both satisfy the assertion, or both refute the assertion.

*Judgments and Proof Rules.* Judgments are of the form {η} s {η }, where the assertions η and η are drawn from Assn.

**Definition 11.** *A judgment* {η} s {η } *is* valid*, written* |= {η} s {η }*, if* η (<sup>s</sup>μ) *for every interpretation of adversarial procedures and every probabilistic state* μ *such that* η(μ)*.*

Figure 2 describes the structural and basic rules of the proof system. Validity of judgments is preserved under standard structural rules, like the rule of consequence [Conseq]. As usual, the rule of consequence allows to weaken the post-condition and to strengthen the post-condition; in our system, this rule serves as the interface between the program logic and mathematical theorems from probability theory. The [Exists] rule is helpful to deal with existentially quantified pre-conditions.

$$\begin{array}{c} \begin{array}{c} \eta\_{\eta} \Rightarrow \eta\_{1} \\ \eta\_{\eta} \end{array} \quad \begin{array}{c} \{\eta\_{1}\} \ s \ \{\eta\_{2}\} \\ \{\eta\_{3}\} \ s \ \{\eta\_{3}\} \end{array} \quad \begin{array}{c} \eta\_{2} \Rightarrow \eta\_{3} \\ \{\exists x.\mathrel{{\{\begin{l\text{S}{X}\}}\}\}\ s \ \{\eta\_{1}\} \\ \{\eta\_{1}\} \end{array} \end{array} \left[\begin{array}{c} \text{CONSEQ} \end{array}\right] \quad \begin{array}{c} \eta\_{\eta} \mathrel{{\{\begin{l\text{S}{X}\}}\}\ {\{\eta\_{1}\}}\ {\{\text{S}\}\} \\ \{\exists x.\mathrel{{\{\begin{l\text{S}{X}\}}\}\}\ {\{\eta\_{1}\}\} \\ \{\eta\_{1}\} \ x\leftarrow e \{\eta\_{1}\} \end{array}} \quad \begin{array}{c} \text{{[A@807]}} \quad \eta\_{1}\{ \begin{array}{l} \neg\left[\begin{array}{l} \left\{\eta\_{1}\right\}\end{array}\left[\text{ASSGN}\right] \\ \left\{\eta\_{1}\right\}\ x\leftarrow e \left\{\eta\_{1}\right\} \end{array}} \quad \begin{array}{c} \left\{\eta\_{1}\}\text{s}\ \{\eta\_{1}\} \\ \{\eta\_{1}\}\text{s}\ \{\eta\_{1}\} \end{array}\right] \quad \begin{array}{c} \left\{\eta\_{1}\right\}\text{s}\ \{\eta\_{1}\} \\ \{\eta\_{1}\}\text{s}\ \{\eta\_{1}\} \end{array}} \quad \begin{array}{c} \left\{\eta\_{1}\right\}\text{s}\ \{\eta\_{1}\} \\ \{$$

**Fig. 2.** Structural and basic rules

The rules for **skip**, assignments, random samplings and sequences are all straightforward. The rule for **abort** requires -⊥ to hold after execution; this assertion uniquely characterizes the resulting null sub-distribution. The rules for assignments and random samplings are semantical.

The rule [Cond] for conditionals requires that the post-condition must be of the form η<sup>1</sup> ⊕ η2; this reflects the semantics of conditionals, which splits the initial probabilistic state depending on the guard, runs both branches, and recombines the resulting two probabilistic states.

The next two rules ([Split] and [Frame]) are useful for local reasoning. The [Split] rule reflects the additivity of the semantics and combines the pre- and post-conditions using the <sup>⊕</sup> operator. The [Frame] rule asserts that lossless statements preserve assertions that are not influenced by modified variables.

The rule [Call] for internal procedures is as expected, replacing the procedure call f with its definition.

Figure 3 presents the rules for loops. We consider four rules specialized to the termination behavior. The [While] rule is the most general rule, as it deals with arbitrary loops. For simplicity, we explain the rule in the special case where the family of assertions is constant, i.e. we have <sup>η</sup>n <sup>=</sup> <sup>η</sup> and <sup>η</sup> n <sup>=</sup> <sup>η</sup> . Informally, the η is the loop invariant and η is an auxiliary assertion used to prove the invariant. We require that η is u-closed, since the semantics of a loop is defined as the limit of its lower approximations. Moreover, the first premise ensures that starting from η, one guarded iteration of the loop establishes η ; the second premise ensures that restricting to ¬e a probabilistic state μ satisfying η yields a probabilistic state μ satisfying η. It is possible to give an alternative formulation where the second premise is substituted by the logical constraint η |¬e <sup>=</sup><sup>⇒</sup> <sup>η</sup>. As usual, the post-condition of the loop is the conjunction of the invariant with the negation of the guard (more precisely in our setting, that the guard has probability 0).

The [While-AST] rule deals with lossless loops. For simplicity, we explain the rule in the special case where the family of assertions is constant, i.e. we have <sup>η</sup>n <sup>=</sup> <sup>η</sup>. In this case, we know that lower approximations and approximations have the same limit, so we can directly prove an invariant that holds after one guarded iteration of the loop. On the other hand, we must now require that the η satisfies the stronger property of t-closedness.

The [While-D] rule handles arbitrary loops with a <sup>d</sup>-closed invariant; intuitively, restricting a sub-distribution that satisfies a downwards closed assertion η yields a sub-distribution which also satisfies η.

The [While-CT] rule deals with certainly terminating loops. In this case, there is no requirement on the assertions.

We briefly compare the rules from a verification perspective. If the assertion is <sup>d</sup>-closed, then the rule [While-D] is easier to use, since there is no need to prove any termination requirement. Alternatively, if we can prove certain termination of the loop, then the rule [While-CT] is the best to use since it does not impose any condition on assertions. When the loop is lossless, there is no need to introduce an auxiliary assertion η , which simplifies the proof goal.

$$\frac{\text{t closed}((\eta\_{n}')\_{n\in\mathbb{N}^{\infty}})}{\begin{subarray}{c} \forall n.\{\eta\_{n}\} \text{ if } e \text{ then } s\left\{\eta\_{n+1}\right\} \quad \forall n.\{\eta\_{n}\} \text{ if } e \text{ then } \text{abort}\left\{\eta\_{n}'\right\} \text{ } \left[\text{WHLLE}\right] \\\hline \quad \{\eta\_{0}\} \text{ while } e \text{ do } s\left\{\eta\_{n}'\right\} \dots \text{ if } \text{!} \\\quad \forall \mu.\,\eta\_{0}(\mu) \implies \left[\text{!}\left(\text{while } e \text{ do } s\right)\right]\_{\mu} = 1 \\\quad \left\{\eta\_{0}\right\} \text{ while } e \text{ do } s\left\{\eta\_{\infty} \wedge \Box \neg e\right\} \\\quad \left\{\eta\_{0}\right\} \text{ while } e \text{ do } s\left\{\eta\_{\infty} \wedge \Box \neg e\right\} \\\end{subarray} \left[\text{WHLLE-AST}\right] \text{ }$$

$$\frac{\text{d} \text{closed}((\eta\_{n})\_{n\in\mathbb{N}^{\infty}}) \qquad \forall n.\{\eta\_{n}\} \text{ if } e \text{ then } s\left\{\eta\_{n+1}\right\} \text{ } \left[\text{WHLLE-D}\right]$$

$$\frac{\forall n.\{\eta\_{n}\} \text{ if } e \text{ then } s\left\{\eta\_{n+1}\right\}}{\left(\text{if } e \text{ then } s\right)^{k}\left[\text{.} = \left[\text{(while } e \text{ do } s\right]\right]\_{\mu} = \left[\text{(while } e \text{ do } s\right]\right]\_{\mu}}$$

$$\{\} \text{ while } e \text{ do } s \text{ } \{\eta\_k \land \Box \neg e\}$$

**Fig. 3.** Rules for loops

$$\frac{\forall n \in \mathbb{N}^{\infty}. \text{ \*\*separated(}\eta\_{n}, \{x, \mathfrak{s}\}) \qquad \mathsf{dclosed}((\eta\_{n})\_{n \in \mathbb{N}^{\infty}})}{\forall f \in a\_{\text{ocl}}, x \in \mathcal{X}\_{a}^{\mathcal{L}}, e \in \mathcal{E}, n \in \mathbb{N}. \{\eta\_{n}\} \; x \leftarrow f(e) \; \{\eta\_{n+1}\}} \; [\text{ADV}]$$

**Fig. 4.** Rules for adversaries

Note however that it might still be beneficial to use the [While] rule, even for lossless loops, because of the weaker requirement that the invariant is u-closed rather than t-closed.

Finally, Fig. 4 gives the adversary rule for general adversaries. It is highly similar to the general rule [While-D] for loops since the adversary may make an arbitrary sequence of calls to the oracles in a**ocl** and may not be lossless. Intuitively, η plays the role of the invariant: it must be d-closed and it must be preserved by every oracle call with arbitrary arguments. If this holds, then η is also preserved by the adversary call. Some framing conditions are required, similar to the ones of the [Frame] rule: the invariant must not be influenced by the state writable by the external procedures.

It is possible to give other variants of the adversary rule with more general invariants by restricting the adversary, e.g., requiring losslessness or bounding the number of calls the external procedure can make to oracles, leading to rules akin to the almost surely terminating and certainly terminating loop rules, respectively.

*Soundness and Relative Completeness.* Our proof system is sound with respect to the semantics.

**Theorem 1 (Soundness).** *Every judgment* {η} s {η } *provable using the rules of our logic is valid.*

Completeness of the logic follows from the next lemma, whose proof makes an essential use of the [While] rule. In the sequel, we use **<sup>1</sup>**μ to denote the characteristic function of a probabilistic state μ, an assertion stating that the current state is equal to μ.

**Lemma 5.** *For every probabilistic state* μ*, the following judgment is provable using the rule of the logic:*

$$\{\mathbf{1}\_{\mu}\} \, s \,\{\mathbf{1}\_{[s]\_{\mu}}\} \cdot$$

*Proof.* By induction on the structure of s.

– <sup>s</sup> <sup>=</sup> **abort**, <sup>s</sup> <sup>=</sup> **skip**, <sup>x</sup> <sup>←</sup> <sup>e</sup> and <sup>s</sup> <sup>=</sup> <sup>x</sup> <sup>←</sup>\$ <sup>g</sup> are trivial; – s = s1; s2, we have to prove

$$\{\mathbf{1}\_{\mu}\} \, s\_1; s\_2 \left\{ \mathbf{1}\_{\left[ \s\_2 \right] \left[ \right.} \right\} \, . $$

We apply the [Seq] rule with <sup>η</sup><sup>1</sup> <sup>=</sup> **<sup>1</sup>**<sup>s</sup>1*<sup>μ</sup>* premises can be directly proved using the induction hypothesis;

– s = **if** e **then** s<sup>1</sup> **else** s2, we have to prove

{**1**μ} **if** <sup>e</sup> **then** <sup>s</sup><sup>1</sup> **else** <sup>s</sup><sup>2</sup> {(**1**<sup>s</sup>1*μ*|*<sup>e</sup>* <sup>⊕</sup> **<sup>1</sup>**<sup>s</sup>2*μ*|¬*<sup>e</sup>* )}.

We apply the [Conseq] rule to be able to apply the [Cond] rule with <sup>η</sup><sup>1</sup> <sup>=</sup> **1**<sup>s</sup>1*μ*|*<sup>e</sup>* and <sup>η</sup><sup>2</sup> <sup>=</sup> **<sup>1</sup>**<sup>s</sup>2*μ*|¬*<sup>e</sup>* Both premises can be proved by an application of the [Conseq] rule followed by the application of the induction hypothesis. – s = **while** e **do** s, we have to prove

{**1**μ} **while** <sup>e</sup> **do** <sup>s</sup> {**1**lim*n*→∞ -(**if** <sup>e</sup> **then** <sup>s</sup>)*n*;**if** <sup>e</sup> **then abort***<sup>μ</sup>* }. We first apply the [While] rule with <sup>η</sup> n <sup>=</sup> **<sup>1</sup>**-(**if** <sup>e</sup> **then** <sup>s</sup>)*n<sup>μ</sup>* and

<sup>η</sup>n <sup>=</sup> **<sup>1</sup>**-(**if** <sup>e</sup> **then** <sup>s</sup>)*n*;**if** <sup>e</sup> **then abort***<sup>μ</sup>* .

For the first premise we apply the same process as for the conditional case: we apply the [Conseq] and [Cond] rules and we conclude using the induction hypothesis (and the [Skip] rule). For the second premise we follow the same process but we conclude using the [Abort] rule instead of the induction hypothesis. Finally we conclude since uclosed((ηn)n∈N∞).

The abstract logic is also relatively complete. This property will be less important for our purposes, but it serves as a basic sanity check.

#### **Theorem 2 (Relative completeness).** *Every valid judgment is derivable.*

*Proof.* Consider a valid judgment {η}s{η }. Let μ be a probabilistic state such that <sup>η</sup>(μ). By the above proposition, {**1**μ}s{**1**<sup>s</sup>*<sup>μ</sup>* }. Using the validity of the judgment and [Conseq], we have {**1**μ <sup>∧</sup> <sup>η</sup>(μ)}s{η }. Using the [Exists] and [Conseq] rules, we conclude {η}s{η } as required.

The side-conditions in the loop rules (e.g., uclosed/tclosed/dclosed and the weight conditions) are difficult to prove, since they are semantic properties. Next, we present a concrete version of the logic with give easy-to-check, syntactic sufficient conditions.

### **5 A Concrete Program Logic**

To give a more practical version of the logic, we begin by setting a concrete syntax for assertions

*Assertions.* We use a two-level assertion language, presented in Fig. 5. A *probabilistic assertion* η is a formula built from comparison of probabilistic expressions, using first-order quantifiers and connectives, and the special connective ⊕. A *probabilistic expression* p can be a logical variable v, an operator applied to probabilistic expressions o(*p*) (constants are 0-ary operators), or the expectation E[˜e] of a state expression ˜e. A *state expression* e˜ is either a program variable <sup>x</sup>, the characteristic function **<sup>1</sup>**φ of a state assertion <sup>φ</sup>, an operator applied to state expressions <sup>o</sup>(*e*˜), or the expectation <sup>E</sup>v∼g[˜e] of state expression ˜<sup>e</sup> in a given distribution g. Finally, a *state assertion* φ is a first-order formula over program variables. Note that the set of operators is left unspecified but we assume that all the expressions in E and D can be encoded by operators.

#### **Fig. 5.** Assertion syntax

The interpretation of the concrete syntax is as expected. The interpretation of probabilistic assertions is relative to a valuation ρ which maps logical variables to values, and is an element of Assn. The definition of the interpretation is straightforward; the only interesting case is -E[˜e]ρ μ which is defined by

Em∼μ[e˜ρ m], where e˜ρ m is the interpretation of the state expression ˜<sup>e</sup> in the memory m and valuation ρ. The interpretation of state expressions is a mapping from memories to values, which can be lifted to a mapping from distributions over memories to distributions over values. The definition of the interpretation is straightforward; the most interesting case is for expectation -Ev∼g[˜e]<sup>ρ</sup> m - <sup>=</sup> <sup>E</sup>w∼g *ρ m*[e˜ <sup>ρ</sup>[v:=w] m ]. We present the full interpretations in the supplemental materials.

Many standard concepts from probability theory have a natural representation in our syntax. For example:


$$\forall v\_1 \ldots v\_n,\ \Pr[\top]^{n-1}\Pr[\bigwedge\_{i=1...n}\tilde{e}\_i = v\_i] = \prod\_{i=1...n} \Pr[\tilde{e}\_i = v\_i];$$

– the fact that a distribution is proper is modeled by the probabilistic assertion L - = Pr[] = 1;

<sup>4</sup> The term Pr[-] *<sup>n</sup>−*<sup>1</sup> is necessary since we work with sub-distributions.

– a state expression ˜e distributed according to a law g is modeled by the probabilistic assertion

$$
\tilde{e} \sim g \stackrel{\triangle}{=} \forall w, \; \Pr[\tilde{e} = w] = \mathbb{E}[\mathbb{E}\_{v \sim g}[\mathbf{1}\_{v=w}]].
$$

The inner expectation computes the probability that v drawn from g is equal to a fixed w; the outer expectation weights the inner probability by the probability of each value of w.

We can easily define operator from the previous section in our new syntax: φ - = Pr[¬φ] = 0.

*Syntactic Proof Rules.* Now that we have a concrete syntax for assertions, we can give syntactic versions of many of the existing proof rules. Such proof rules are often easier to use since they avoid reasoning about the semantics of commands and assertions. We tackle the non-looping rules first, beginning with the following syntactic rules for assignment and sampling:

$$\begin{array}{cc} \begin{array}{c} \begin{array}{c} \{\eta \mathrel{=} e\} \{ \begin{array}{c} \text{AssCNN} \end{array} \} \end{array} \end{array} \end{bmatrix} \begin{array}{c} \begin{array}{c} \text{AssCNN} \end{array} \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \begin{array}{c} \text{[ASMPLE]} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{[SAMPLE]} \end{array} \end{array} \end{array}$$

The rule for assignment is the usual rule from Hoare logic, replacing the program variable x by its corresponding expression e in the pre-condition. The replacement η[x := e] is done recursively on the probabilistic assertion η; for instance for expectations, it is defined by E[˜e][x := e] - = E[˜e[x := e]], where ˜e[x := e] is the syntactic substitution.

The rule for sampling uses probabilistic substitution operator P<sup>g</sup> x(η), which replaces all occurrences of x in η by a new integration variable t and records that t is drawn from g; the operator is defined in Fig. 6.

### **Fig. 6.** Syntactic op. P (main cases)

Next, we turn to the loop rule. The side-conditions from Fig. 3 are purely semantic, while in practice it is more convenient to use a sufficient condition in the Hoare logic. We give sufficient conditions for ensuring certain and almost-sure termination in Fig. 7; ˜e is an integervalued expression. The first side-

condition CCTerm shows certain termination given a strictly decreasing *variant* e˜ that is bounded below, similar to how a decreasing variant shows termination for deterministic programs. The second side-condition CASTerm shows almost-sure termination given a probabilistic variant ˜e, which must be bounded both above and below. While ˜e may increase with some probability, it must decrease with strictly positive probability. This condition was previously considered by [17] for probabilistic transition systems and also used in expectationbased approaches [20,33]. Our framework can also support more refined conditions (e.g., based on super-martingales [9,31]), but the condition CASTerm already suffices for most randomized algorithms.

$$\begin{array}{l} \mathcal{C}\_{\text{CTTerm}} \stackrel{\triangle}{=} \{ \mathcal{L} \land \Box(\tilde{e} = k \land 0 < k \land b) \} \text{ s } \{ \mathcal{L} \land \Box(\tilde{e} < k) \} \\ \vdash \eta \Rightarrow (\exists \dot{y}. \,\Box \tilde{e} \le \dot{y}) \land \Box(\tilde{e} = 0 \Rightarrow \neg b) \\ \mathcal{C}\_{\text{ASTerm}} \stackrel{\triangle}{=} \{ \mathcal{L} \land \Box(\tilde{e} = k \land 0 < k \le K \land b) \} \text{ s } \{ \mathcal{L} \land \Box(0 \le \tilde{e} \le K) \land \Pr[\tilde{e} < k] \ge \epsilon \} \\ \vdash \eta \Rightarrow \Box(0 \le \tilde{e} \le K \land \tilde{e} = 0 \Rightarrow \neg b) \\ \vdash \mathsf{tcøsed}(\eta) \end{array}$$

#### **Fig. 7.** Side-conditions for loop rules

While t-closedness is a semantic condition (cf. Definition 10), there are simple syntactic conditions to guarantee it. For instance, assertions that carry a non-strict comparison ∈{≤, ≥, =} between two bounded probabilistic expressions are t-closed; the assertion stating probabilistic independence of a set of expressions is t-closed.

*Precondition Calculus.* With a concrete syntax for assertions, we are also able to incorporate syntactic reasoning principles. One classic tool is Morgan and McIver's *greatest pre-expectation*, which we take as inspiration for a pre-condition calculus for the loop-free fragment of Ellora. Given an assertion <sup>η</sup> and a loopfree statement s, we mechanically construct an assertion η<sup>∗</sup> that is the precondition of s that implies η as a post-condition. The basic idea is to replace each expectation expression p inside η by an expression p<sup>∗</sup> that has the same denotation before running s as p after running s. This process yields an assertion η<sup>∗</sup> that, interpreted before running s, is logically equivalent to η interpreted after running s.

The computation rules for pre-conditions are defined in Fig. 8. For a probability assertion η, its pre-condition pc(s, η) corresponds to η where the expectation expressions of the form E[˜e] are replaced by their corresponding *preterm*, pe(s,E[˜e]). Pre-terms correspond loosely to Morgan and McIver's *preexpectations*—we will make this correspondence more precise in the next section. The main interesting cases for computing pre-terms are for random sampling and conditionals. For random sampling the result is P<sup>g</sup> x(E[˜e]), which corresponds to the [Sample] rule. For conditionals, the expectation expression is split into a part where e is true and a part where e is not true. We restrict the expectation to a part satisfying e with the operator E[˜e] |e - <sup>=</sup> <sup>E</sup>[˜<sup>e</sup> · **<sup>1</sup>**e]. This corresponds to the expected value of ˜e on the portion of the distribution where e is true. Then, we can build the pre-condition calculus into Ellora.

**Theorem 1.** *Let* s *be a non-looping command. Then, the following rule is derivable in the concrete version of* Ellora*:*

$$\overline{\{pc(s,\eta)\}\,s\,\{\eta\}}\,^{[\text{PC}]}$$

### **6 Case Studies: Embedding Lightweight Logics**

While Ellora is suitable for general-purpose reasoning about probabilistic programs, in practice humans typically use more special-purpose proof

$$\begin{aligned} \text{pe}(s\_1; s\_2, \mathbb{E}[\hat{e}]) & \stackrel{\triangle}{=} \text{pe}(s\_1, \text{pe}(s\_2, \mathbb{E}[\hat{e}])) \\ \text{pe}(x \leftarrow e, \mathbb{E}[\hat{e}]) & \stackrel{\triangle}{=} \mathbb{E}[\hat{e}][x := e] \\ \text{pe}(x \nleq g, \mathbb{E}[\hat{e}]) & \stackrel{\triangle}{=} \mathcal{P}\_x^g(\mathbb{E}[\hat{e}]) \\ \text{pe}(\text{if } e \text{ then } s\_1 \text{ else } s\_2, \mathbb{E}[\hat{e}]) & \stackrel{\triangle}{=} \text{pe}(s\_1, \mathbb{E}[\hat{e}])\_{|e} + \text{pe}(s\_2, \mathbb{E}[\hat{e}])\_{|-e} \end{aligned}$$

**Fig. 8.** Precondition calculus (selected)

techniques—often targeting just a single, specific kind of property, like probabilistic independence—when proving probabilistic assertions. When these techniques apply, they can be a convenient and powerful tool.

To capture this intuitive style of reasoning, researchers have considered lightweight program logics where the assertions and proof rules are tailored to a specific proof technique. We demonstrate how to integrate these tools in an assertion-based logic by introducing and embedding a new logic for reasoning about independence and distribution laws, useful properties when analyzing randomized algorithms. We crucially rely on the rich assertions in Ellora it is not clear how to extend expectation-based approaches to support similar, lightweight reasoning. Then, we show to embed the union bound logic [4] for proving accuracy bounds.

#### **6.1 Law and Independence Logic**

We begin by describing the law and independence logic IL, a proof system with intuitive rules that are easy to apply and amenable to automation. For simplicity, we only consider programs which sample from the binomial distribution, and have deterministic control flow—for lack of space, we also omit procedure calls.

**Definition 12 (Assertions).** IL *assertions have the grammar:*

$$\xi := \det(e) \mid \#E \mid e \sim \text{B}(e, p) \mid \top \mid \perp \mid \xi \wedge \xi$$

*where* e ∈ E*,* E ⊆ E*, and* p ∈ [0, 1]*.*

The assertion det(e) states that e is deterministic in the current distribution, i.e., there is at most one element in the support of its interpretation. The assertion #E states that the expressions in E are independent, as formalized in the previous section. The assertion e ∼ B(m, p) states that e is distributed according to a binomial distribution with parameter m (where m can be an expression) and constant probability p, i.e. the probability that e = k is equal to the probability that exactly k independent coin flips return heads using a biased coin that returns heads with probability p.

Assertions can be seen as an instance of a logical abstract domain, where the order between assertions is given by implication based on a small number of axioms. Examples of such axioms include independence of singletons, irreflexivity of independence, anti-monotonicity of independence, an axiom for the sum of binomial distributions, and rules for deterministic expressions:

$$\#\{x\} \qquad \#\{x,x\} \iff \det(x) \qquad \qquad \#(E \cup E') \implies \#E$$

$$e \sim \mathcal{B}(m,p) \land e' \sim \mathcal{B}(m',p) \land \#\{e,e'\} \implies e + e' \sim \mathcal{B}(m+m',p)$$

$$\bigwedge\_{1 \le i \le n} \det(e\_i) \implies \det(f(e\_1,\ldots,e\_n))$$

**Definition 13.** *Judgments of the logic are of the form* {ξ} s {ξ }*, where* ξ *and* <sup>ξ</sup> *are* IL*-assertions. A judgment is* valid *if it is derivable from the rules of Fig. 9; structural rules and rule for sequential composition are similar to those from Sect. 4 and omitted.*

The rule [IL-Assgn] for deterministic assignments is as in Sect. 4. The rule [IL-Sample] for random assignments yields as post-condition that the variable x and a set of expressions E are independent assuming that E is independent before the sampling, and moreover that x follows the law of the distribution that it is sampled from. The rule [IL-Cond] for conditionals requires that the guard is deterministic, and that each of the branches satisfies the specification; if the guard is not deterministic, there are simple examples where the rule is not sound. The rule [IL-While] for loops requires that the loop is certainly terminating with a deterministic guard. Note that the requirement of certain termination could be avoided by restricting the structural rules such that a statement s has deterministic control flow whenever {ξ} s {ξ } is derivable.

We now turn to the embedding. The embedding of IL assertions into general assertions is immediate, except for det(e) which is translated as e ∨ -¬e. We let ξ denote the translation of ξ.

**Theorem 2** (Embedding and soundness of IL logic)**.** *If* {ξ} <sup>s</sup> {ξ } *is derivable in the* IL *logic, then* {ξ} <sup>s</sup> {ξ } *is derivable in (the syntactic variant of )* Ellora*. As a consequence, every derivable judgment* {ξ} s {ξ } *is valid.*

*Proof sketch.* By induction on the derivation. The interesting cases are conditionals and loops. For conditionals, the soundness follows from the soundness of the rule:

$$\begin{array}{c} \{\eta\} \, s\_1 \, \{\eta'\} \qquad \{\eta\} \, s\_2 \, \{\eta'\} \qquad \Box e \lor \Box \neg e\\ \hline \{\eta\} \, \text{if } e \text{ then } s\_1 \text{ else } s\_2 \, \{\eta'\} \end{array}$$

To prove the soundness of this rule, we proceed by case analysis on e ∨ -¬e. We treat the case e; the other case is similar. In this case, η is equivalent to η<sup>1</sup> ∧ e ⊕ η<sup>2</sup> ∧ -¬e, where η<sup>1</sup> = η and η<sup>2</sup> = ⊥. Let η <sup>1</sup> = η and η<sup>2</sup> = -⊥; again, η <sup>1</sup> ⊕ η <sup>2</sup> is logically equivalent to η . The soundness of the rule thus follows from

$$\begin{array}{c} \begin{array}{l} \{\xi[x:=e]\} \ x \leftarrow e \ \{\xi\} \end{array} \text{[IL-AssGN]} \\\\ \begin{array}{l} \{x\} \cap \mathrm{FV}(E) \cap \mathrm{FV}(e) = \emptyset \\\\ \{\#E\} \ x \nsubseteq B(e,p) \ \{\#(E\cup\{x\}) \land x \sim \mathrm{B}(e,p)\} \end{array} \text{[IL-AssMPLE]} \\\\ \begin{array}{l} \{\xi\} \ s\_{1} \ \{\xi'\} \ \{\begin{array}{l} \{\xi'\} \ \{\xi'\} \ s\_{2} \ \{\xi''\} \\\\ \{\xi\} \ s\_{1} \ \mathrm{s}\_{1} \ \{\xi''\} \end{array} \text{[IL-SEQ]} \\\\ \begin{array}{l} \{\xi\} \ s\_{1} \ \{\xi'\} \ \{\xi\} \ s\_{2} \ \{\xi'\} \\\\ \hline \{\xi\} \ \mathrm{if} \ b \ \mathrm{then} \ s\_{1} \ \mathrm{else} \ s\_{2} \ \{\xi'\} \end{array} \text{[IL-ConD]} \\\\ \begin{array}{l} \{\xi\} \ s \ \{\xi\} \ \xi \ \implies \mathsf{det}(b) \end{array} \begin{array}{l} \mathcal{C}\mathrm{Term} \\ \text{[}\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\/\} \end{array} \end{array}$$

**Fig. 9.** IL proof rules (selected)

the soundness of the [Cond] and [Conseq] rules. For loops, there exists a natural number n such that **while** b **do** s is semantically equivalent to (**if** b **then** s)n. By assumption {ξ} s {ξ} holds, and thus by induction hypothesis {ξ} s {ξ}. We also have <sup>ξ</sup> <sup>=</sup><sup>⇒</sup> det(b), and hence {ξ} **if** <sup>b</sup> **then** <sup>s</sup> {ξ}. We conclude by [Seq].

To illustrate our system IL, consider the statement <sup>s</sup> in Fig. <sup>10</sup> which flips a fair coin N times and counts the number of heads. Using the logic, we prove that <sup>c</sup> ∼ B(N · (N + 1)/2, 1/2) is a post-condition for s. We take the invariant:

$$\mathbf{c} \sim \mathbf{B} \left( \mathbf{j} (\mathbf{j} + 1) / 2, 1 / 2 \right)$$

The invariant holds initially, as 0 ∼ B(0, 1/2). For the inductive case, we show:

$$\{\mathbf{c} \sim \mathbf{B}\left(0, 1/2\right)\} \ s\_0 \left\{ \mathbf{c} \sim \mathbf{B}\left((\mathbf{j} + 1)(\mathbf{j} + 2)/2, 1/2\right) \right\}$$

where <sup>s</sup><sup>0</sup> represents the loop body, i.e. <sup>x</sup> <sup>←</sup>\$ B (j, <sup>1</sup>/2) ; <sup>c</sup> <sup>←</sup> <sup>c</sup>+x. First, we apply the rule for sequence taking as intermediate assertion

$$\mathbf{c} \sim \mathbf{B} \left( \mathbf{j} \left( \mathbf{j} + 1 \right) / 2, 1/2 \right) \land \mathbf{x} \sim \mathbf{B} \left( \mathbf{j}, 1/2 \right) \land \# \{ \mathbf{x}, \mathbf{c} \} \mathbf{j}$$

The first premise follows from the rule for random assignment and structural rules. The second premise follows from the rule for deterministic assignment and the rule of consequence, applying axioms about sums of binomial distributions.

**Fig. 10.** Sum of bin.

We briefly comment on several limitations of IL. First, IL is restricted to programs with deterministic control

flow, but this restriction could be partially relaxed by enriching IL with assertions for conditional independence. Such assertions are already expressible in the logic of Ellora; adding conditional independence would significantly broaden the scope of the IL proof system and open the possibility to rely on axiomatizations of conditional independence (e.g., based on graphoids [36]). Second, the logic only supports sampling from binomial distributions. It is possible to enrich the language of assertions with clauses c ∼ g where g can model other distributions, like the uniform distribution or the Laplace distribution. The main design challenge is finding a core set of useful facts about these distributions. Enriching the logic and automating the analysis are interesting avenues for further work.

### **6.2 Embedding the Union Bound Logic**

The program logic aHL [4] was recently introduced for estimating accuracy of randomized computations. One main application of aHL is proving accuracy of randomized algorithms, both in the offline and online settings—i.e. with adversary calls. aHL is based on the union bound, a basic tool from probability theory, and has judgments of the form <sup>|</sup>=β {Φ} <sup>s</sup> {Ψ}, where <sup>s</sup> is a statement, <sup>Φ</sup> and Ψ are first-order formulae over program variables, and β is a probability, i.e. <sup>β</sup> <sup>∈</sup> [0, 1]. A judgment <sup>|</sup>=β {Φ} <sup>s</sup> {Ψ} is valid if for every memory <sup>m</sup> such that Φ(m), the probability of ¬Ψ in <sup>s</sup>m is upper bounded by <sup>β</sup>, i.e. Prs*m*[¬Ψ] <sup>≤</sup> <sup>β</sup>.

Figure <sup>11</sup> presents some key rules of aHL, including a rule for sampling from the Laplace distribution <sup>L</sup> centered around <sup>e</sup>. The predicate <sup>C</sup>CTerm(k) indicates that the loop terminates in at most k steps on any memory that satisfies the pre-condition. Moreover, β is a function of .

$$\begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \begin{array}{l} \text{AHL-SAMPLE} \end{array} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \text{AHL-SAMPLE} \end{array} \end{array} \end{array} \end{bmatrix} \\\\ \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \left\{ \begin{array}{l} \text{A} \end{array} \right\} s\_{1} \left\{ \begin{array}{l} \text{B} \end{array} \right\} s\_{2} \left\{ \begin{array}{l} \text{AHL-SAPLE} \end{array} \right\} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \begin{array}{l} \text{AHL-SAMPLE} \end{array} \end{array} \end{bmatrix} \\\\ \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \left\{ \begin{array}{l} \text{A} \end{array} \right\} c\_{1} \left\{ \begin{array}{l} \text{A} \end{array} \right\} s\_{2} \left\{ \begin{array}{l} \text{AHL-SAPLE} \end{array} \right\} \end{array} \end{bmatrix} \begin{array}{l} \begin{array}{l} \begin{array}{l} \text{AHL-SAPLE} \end{array} \end{array} \end{bmatrix} \\\\ \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \text{A} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \begin{array}{l} \text{CCTAL-SAPLE} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \text$$

**Fig. 11.** aHL proof rules (selected)

aHL has a simple embedding into Ellora.

**Theorem 3** (Embedding of aHL)**.** *If* <sup>|</sup>=β {Φ} <sup>s</sup> {Ψ} *is derivable in* aHL*, then* {-<sup>Φ</sup>} <sup>s</sup> {E[**1**<sup>¬</sup>Ψ ] <sup>≤</sup> <sup>β</sup>} *is derivable in* Ellora*.*

### **7 Case Studies: Verifying Randomized Algorithms**

In this section, we will demonstrate Ellora on a selection of examples; we present further examples in the supplemental material. Together, they exhibit a wide variety of different proof techniques and reasoning principles which are available in the Ellora's implementation.

*Hypercube Routing.* will begin with the *hypercube routing* algorithm [41,42]. Consider a network topology (the *hypercube*) where each node is labeled by a bitstring of length D and two nodes are connected by an edge if and only if the two corresponding labels differ in exactly one bit position.

In the network, there is initially one packet at each node, and each packet has a unique destination. The algorithm implements a routing strategy based on *bit fixing*: if the current position has bitstring i, and the target node has bitstring j, we compare the bits in i and j from left to right, moving along the edge that corrects the first differing bit. Valiant's algorithm uses randomization to guarantee that the total number of steps grows *logarithmically* in the number of packets. In the first phase, each packet i select an intermediate destination ρ(i) uniformly at random, and use bit fixing to reach ρ(i). In the second phase, each packet use bit fixing to go from ρ(i) to the destination j. We will focus on the first phase since the reasoning for the second phase is nearly identical. We can model the strategy with the code in Fig. 12, using some syntactic sugar for the **for** loops.<sup>5</sup>

**Fig. 12.** Hypercube Routing

We assume that initially, the position of the packet i is at node i (see Map.init). Then, we initialize the random intermediate destinations ρ. The remaining loop encodes the evaluation of the routing strategy iterated T time. The variable usedBy is a map that logs if an edge is already used by a packet, it is empty at the beginning of each iteration. For each packet, we try to move it across one edge along the path to its intermediate destination. The function getEdge returns the next edge to follow, following the bit-fixing scheme. If the packet can progress (its

edge is not used), then its current position is updated and the edge is marked as used.

We show that if the number of timesteps T is 4D + 1, then all packets reach their intermediate destination in at most T steps, except with a small probability 2−2<sup>D</sup> of failure. That is, the number of timesteps grows linearly in D, logarithmic in the number of packets. This is formalized in our system as:

$$\{T = 4D + 1\}\\ \text{route}\{\text{Pr}[\exists i.\,\text{pos}[i] \neq \rho[i]] \leq 2^{-2D}[\}\}$$

<sup>5</sup> Recall that the number of node in a hypercube of dimension D is 2*<sup>D</sup>* so each node can be identified by a number in [1, 2*<sup>D</sup>*].

**Fig. 13.** Coupon collector

*Modeling Infinite Processes.* Our second example is the *coupon collector* process. The algorithm draws a uniformly random coupon (we have N coupon) on each day, terminating when it has drawn at least one of each kind of coupon. The code of the algorithm is displayed in Fig. 13; the array cp records of the coupons seen so far, t holds the number of steps taken before seeing a new coupon, and X tracks of the total number of steps. Our goal is to bound the average number of iterations. This is formalized in our logic as:

$$\{\mathcal{L}\}\text{ coupon }\left\{\mathbb{E}[X] = \sum\_{i \in [1,N]} \left(\frac{N}{N-i+1}\right)\right\}.$$

*Limited Randomness. Pairwise independence* says that if we see the result of <sup>X</sup>i, we do not gain information about all other variables <sup>X</sup>k. However, if we see the result of *two* variables <sup>X</sup>i, Xj , we may gain information about <sup>X</sup>k. There are many constructions in the algorithms literature that grow a small number of independent bits into more pairwise independent bits. Figure 14 gives one procedure, where ⊕ is exclusive-or, and bits(j) is the set of positions set to 1 in the binary expansion of j. The proof uses the following fact, which we fully verify: for a uni-

**Fig. 14.** Pairwise Independence

formly distributed Boolean random variable Y , and a random variable Z of any type,

$$Y \# Z \Rightarrow Y \oplus f(Z) \# g(Z) \tag{1}$$

for any two Boolean functions f,g. Then, note that X[i] = {j∈bits(i)} <sup>B</sup>[j] where the big XOR operator ranges over the indices j where the bit representation of <sup>i</sup> has bit <sup>j</sup> set. For any two i, k <sup>∈</sup> [1,..., <sup>2</sup><sup>N</sup>] distinct, there is a bit position in [1,..., N] where i and k differ; call this position r and suppose it is set in i but not in k. By rewriting,

$$\mathbf{x}[i] = \mathbf{z}[r] \oplus \bigoplus\_{\{j \in \text{bit} \, \mathbf{z}(i) \backslash r\}} \mathbf{z}[j] \quad \text{and} \quad \mathbf{x}[k] = \bigoplus\_{\{j \in \text{bit} \, \mathbf{z}(k) \backslash r\}} \mathbf{z}[j].$$

Since B[j] are all independent, X[i] # X[k] follows from Eq. (1) taking Z to be the distribution on tuples B[1],..., <sup>B</sup>[N] excluding <sup>B</sup>[r]. This verifies pairwise independence:

$$\{\mathcal{L}\}\_{\mathsf{PWInd}(\mathsf{N})} \left\{ \mathcal{L} \land \forall i, k \in [2^{\mathsf{N}}] \colon i \neq k \Rightarrow \mathsf{x}[i] \not\models \mathsf{x}[k] \right\}.$$

*Adversarial Programs.* Pseudorandom functions (PRF) and pseudorandom permutations (PRP) are two idealized primitives that play a central role in the design of symmetric-key systems. Although the most natural assumption to make about a blockcipher is that it behaves as a pseudorandom permutation, most commonly the security of such a system is analyzed by replacing the blockcipher with a perfectly random function. The PRP/PRF Switching Lemma [6,22] fills the gap: given a bound for the security of a blockcipher as a pseudorandom function, it gives a bound for its security as a pseudorandom permutation.

**Lemma 4** (PRP/PRF switching lemma). *Let* A *be an adversary with blackbox access to an oracle O implementing either a random permutation on* {0, 1}<sup>l</sup> *or a random function from* {0, 1}<sup>l</sup> *to* {0, 1}<sup>l</sup> *. Then the probability that the adversary* A *distinguishes between the two oracles in at most* q *calls is bounded by*

$$|\Pr\_{PRP}[b \land |H| \le q] - \Pr\_{PRF}[b \land |H| \le q]| \le \frac{q(q-1)}{2^{l+1}},$$

*where* H *is a map storing each adversary call and* |H| *is its size.*

Proving this lemma can be done using the Fundamental Lemma of Game-Playing, and bounding the probability of *bad* in the program from Fig. 15. We focus on the latter. Here we apply the [Adv] rule of Ellora with the invariant <sup>∀</sup>k,Pr[bad∧|H| ≤ <sup>k</sup>] <sup>≤</sup> <sup>k</sup>(k−1) <sup>2</sup>*l*+1 where |H| is the size of the map H, i.e. the number of adversary call. Intuitively, the invariant says that at each call to the oracle the probability that bad has been set before and that the number of adversary call is less than k is bounded by a polynomial in k.

The invariant is d-closed and true before the adversary call, since at that point Pr[bad] = 0. Then we need to prove that the oracle preserves the invariant, which can be done easily using the precondition calculus ([PC] rule).


**Fig. 15.** PRP/PRF game

### **8 Implementation and Mechanization**

We have built a prototype implementation of Ellora within EasyCrypt [2,5], a theorem prover originally designed for verifying cryptographic protocols. Easy-Crypt provides a convenient environment for constructing proofs in various Hoare logics, supporting interactive, tactic-based proofs for manipulating assertions and allowing users to invoke external tools, like SMT-solvers, to discharge proof obligations. EasyCrypt provides a mature set of libraries for both data structures (sets, maps, lists, arrays, etc.) and mathematical theorems (algebra, real analysis, etc.), which we extended with theorems from probability theory.


We used the implementation for verifying many examples from the literature, including all the programs presented in Sect. 7 as well as some additional examples in Table 1 (such as polynomial identity test, private running sums, properties about random walks, etc.). The verified proofs bear a strong resemblance to the existing, paper proofs. Independently of this work, Ellora has been used to formalize the main theorem about a randomized gossip-based protocol for distributed systems [26, Theorem 2.1]. Some libraries developed in the scope of Ellora have been incorporated into the main branch of EasyCrypt, includ-

ing a general library on probabilistic independence.

*A New Library for Probabilistic Independence.* In order to support assertions of the concrete program logic, we enhanced the standard libraries of EasyCrypt, notably the ones dealing with big operators and sub-distributions. Like all Easy-Crypt libraries, they are written in a foundational style, i.e. they are defined instead of axiomatized. A large part of our libraries are proved formally from first principles. However, some results, such as concentration bounds, are currently declared as axioms.

Our formalization of probabilistic independence deserves special mention. We formalized two different (but logically equivalent) notions of independence. The first is in terms of products of probabilities, and is based on heterogenous lists. Since Ellora (like EasyCrypt) has no support for heterogeneous lists, we use a smart encoding based on second-order predicates. The second definition is more abstract, in terms of product and marginal distributions. While the first definition is easier to use when reasoning about randomized algorithms, the second definition is more suited for proving mathematical facts. We prove the two definitions equivalent, and formalize a collection of related theorems.

*Mechanized Meta-Theory.* The proofs of soundness and relative completeness of the abstract logic, without adversary calls, and the syntactical termination arguments have been mechanized in the Coq proof assistant. The development is available in supplemental material.

### **9 Related Work**

*More on Assertion-Based Techniques.* The earliest assertion-based system is due to Ramshaw [37], who proposes a program logic where assertions can be formulas involving *frequencies*, essentially probabilities on sub-distributions. Ramshaw's logic allows assertions to be combined with operators like ⊕, similar to our approach. [18] presents a Hoare-style logic with general assertions on the distribution, allowing expected values and probabilities. However, his **while** rule is based on a semantic condition on the guarded loop body, which is less desirable for verification because it requires reasoning about the semantics of programs. [8] give decidability results for a probabilistic Hoare logic without **while** loops. We are not aware of any existing system that supports assertions about general expected values; existing works also restrict to Boolean distributions. [38] formalize a Hoare logic for probabilistic programs but unlike our work, their assertions are interpreted on *distributions* rather than sub-distributions. For conditionals, their semantics rescales the distribution of states that enter each branch. However, their assertion language is limited and they impose strong restrictions on loops.

*Other Approaches.* Researchers have proposed many other approaches to verify probabilistic program. For instance, verification of Markov transition systems goes back to at least [17,40]; our condition for ensuring almost-sure termination in loops is directly inspired by their work. Automated methods include model checking (see e.g., [1,25,29]) and abstract interpretation (see e.g., [12,32]). Techniques for reasoning about higher-order (functional) probabilistic languages are an active subject of research (see e.g., [7,13,14]). For analyzing probabilistic loops, in particular, there are tools for reasoning about running time. There are also automated systems for synthesizing invariants [3,11]. [9,10] use a martingale method to compute the expected time of the coupon collector process for N = 5—fixing N lets them focus on a program where the outer **while** loop is fully unrolled. Martingales are also used by [15] for analyzing probabilistic termination. Finally, there are approaches involving symbolic execution; [39] use a mix of static and dynamic analysis to check probabilistic programs from the approximate computing literature.

### **10 Conclusion and Perspectives**

We introduced an expressive program logic for probabilistic programs, and showed that assertion-based systems are suited for practical verification of probabilistic programs. Owing to their richer assertions, program logics are a more suitable foundation for specialized reasoning principles than expectation-based systems. As evidence, our program logic can be smoothly extended with custom reasoning for probabilistic independence and union bounds. Future work includes proving better accuracy bounds for differentially private algorithms, and exploring further integration of Ellora into EasyCrypt.

**Acknowledgments.** We thank the reviewers for their helpful comments. This work benefited from discussions with Dexter Kozen, Annabelle McIver, and Carroll Morgan. This work was partially supported by ERC Grant #679127, and NSF grant 1718220.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Fine-Grained Semantics for Probabilistic Programs

Benjamin Bichsel(B) , Timon Gehr, and Martin Vechev

ETH Zürich, Zürich, Switzerland {benjamin.bichsel,timon.gehr,martin.vechev}@inf.ethz.ch

Abstract. Probabilistic programming is an emerging technique for modeling processes involving uncertainty. Thus, it is important to ensure these programs are assigned precise formal semantics that also cleanly handle typical exceptions such as non-termination or division by zero. However, existing semantics of probabilistic programs do not fully accommodate different exceptions and their interaction, often ignoring some or conflating multiple ones into a single exception state, making it impossible to distinguish exceptions or to study their interaction.

In this paper, we provide an expressive probabilistic programming language together with a fine-grained measure-theoretic denotational semantics that handles and distinguishes non-termination, observation failures and error states. We then investigate the properties of this semantics, focusing on the interaction of different kinds of exceptions. Our work helps to better understand the intricacies of probabilistic programs and ensures their behavior matches the intended semantics.

### 1 Introduction

A probabilistic programming language allows probabilistic models to be specified independently of the particular inference algorithms that make predictions using the model. Probabilistic programs are formed using standard language primitives as well as constructs for drawing random values and conditioning. The overall approach is general and applicable to many different settings (e.g., building cognitive models). In recent years, the interest in probabilistic programming systems has grown rapidly with various languages and probabilistic inference algorithms (ranging from approximate to exact). Examples include [10,11,13,14,25–27,29,36]; for a recent survey, please see [15]. An important branch of recent probabilistic programming research is concerned with providing a suitable semantics for these programs enabling one to formally reason about the program's behaviors [2–4,33–35].

Often, probabilistic programs require access to primitives that may result in unwanted behavior. For example, the standard deviation σ of a Gaussian distribution must be positive (sampling from a Gaussian distribution with negative standard deviation should result in an error). If a program samples from a Gaussian distribution with a non-constant standard deviation, it is in general undecidable if that standard deviation is guaranteed to be positive. A similar situation occurs for while loops: except in some trivial cases, it is hard to decide if a program terminates with probability one (even harder than checking termination of deterministic programs [20]). However, general while loops are important for many probabilistic programs. As an example, a Markov Chain Monte Carlo sampler is essentially a special probabilistic program, which in practice requires a non-trivial stopping criterion (see e.g. [6] for such a stopping criterion). In addition to offering primitives that may result in such unwanted behavior, many probabilistic programming languages also provide an **observe** primitive that intuitively allows to filter out executions violating some constraint.

*Motivation.* Measure-theoretic denotational semantics for probabilistic programs is desirable as it enables reasoning about probabilistic programs within the rigorous and general framework of measure theory. While existing research has made substantial progress towards a rigorous semantic foundation of probabilistic programming, existing denotational semantics based on measure theory usually conflate failing **observe** statements (i.e., conditioning), error states and non-termination, often modeling at least some of these as missing weight in a sub-probability measure (we show why this is practically problematic in later examples). This means that even semantically, it is impossible to distinguish these types of exceptions<sup>1</sup>. However, distinguishing exceptions is essential for a solid understanding of probabilistic programs: it is insufficient if the semantics of a probabilistic programming language can only express that *something* went wrong during the execution of the program, lacking the capability to distinguish for example non-termination and errors. Concretely, programmers often want to avoid non-termination and assertion failure, while observation failure is acceptable (or even desirable). When a program runs into an exception, the programmer should be able determine the type of exception, from the semantics.

*This Work.* This paper presents a clean denotational semantics for a Turing complete first-order probabilistic programming language that supports mixing continuous and discrete distributions, arrays, observations, partial functions and loops. This semantics distinguishes observation failures, error states and nontermination by tracking them as explicit program states. Our semantics allows for fine-grained reasoning, such as determining the termination probability of a probabilistic program making observations from a sequence of concrete values.

In addition, we explain the consequences of our treatment of exceptions by providing interesting examples and properties of our semantics, such as commutativity in the absence of exceptions, or associativity regardless of the presence of exceptions. We also investigate the interaction between exceptions and the **score** primitive, concluding in particular that the probability of non-termination cannot be defined in this case. **score** intuitively allows to increase or decrease the probability of specific runs of a program (for more details, see Sect. 5.3).

<sup>1</sup> In this paper, we refer to errors, non-termination and observation failures collectively as *exceptions*. For example, a division by zero is an error (and hence and exception), while non-termination is an exception but not an error.

### 2 Overview

In this section we demonstrate several important features of our probabilistic programming language (PPL) using examples, followed by a discussion involving different kinds of exception interactions.

### 2.1 Features of Probabilistic Programs

In the following, we informally discuss the most important features of our PPL.

*Discrete and Continuous Primitive Distributions.* Listing 1 illustrates a simple Gaussian mixture model (the figure only shows the function body). Depending on the outcome of a fair coin flip x (resulting in 0 or 1), y is sampled from a Gaussian distribution with mean 0 or mean 2 (and standard deviation 1). Note that in our PPL, we represent **gauss**(·, ·) by the more general construct **sampleFrom**f (·, ·), with <sup>f</sup> : **<sup>R</sup>** <sup>×</sup> [0,∞) <sup>→</sup> **<sup>R</sup>** <sup>→</sup> **<sup>R</sup>** being the probability density function of the Gaussian distribution f(μ, σ)(x) = <sup>√</sup> 1 <sup>2</sup>πσ<sup>2</sup> <sup>e</sup><sup>−</sup> (x−μ)2 <sup>2</sup>σ<sup>2</sup> .

*Conditioning.* Listing 2 samples two independent values from the uniform distribution on the interval [0, 1] and conditions the possible values of x and y on the observation x + y > 1 before returning x. Intuitively, the first two lines express a-priori knowledge about the uncertain values of x and y. Then, a measurement determines that x+y is greater than 1. We combine this new information



Listing 2. Conditioning on a continuous distribution

with the existing knowledge. Because x+y > 1 is more likely for larger values of x, the return value has larger weight on larger values. Formally, our semantics handles **observe** by introducing an extra program state for observation failure -. Hence, the probability distribution after the third line of Listing 2 will put weight <sup>1</sup> <sup>2</sup> on and weight <sup>1</sup> <sup>2</sup> on those x and y satisfying x + y > 1.

In practice, one will usually condition the output distribution on there being no observation failure (-). For discrete distributions, this amounts to computing:

$$\Pr[X = x \mid X \neq \sharp] = \frac{\Pr[X = x \land X \neq \sharp]}{\Pr[X \neq \sharp]} = \frac{\Pr[X = x]}{1 - \Pr[X = \sharp]}$$

where x is the outcome of the program (a value, non-termination or an error) and P r[X = x] is the probability that the program results in x. Of course, this conditioning only works when the probability of is not 1. Note that tracking the probability of has the practical benefit of rendering the (often expensive) marginalization P r[X = -] = - P r[X = x] unnecessary.

x=-Other semantics often use sub-probability measures to express failed observations [4,34,35]. These semantics would say that Listing 2 results in a return value between 0 and 1 with probability <sup>1</sup> <sup>2</sup> (and infer that the missing weight of <sup>1</sup> <sup>2</sup> is due to failed observations). We believe one should improve upon this approach as the semantics only implicitly states that the program sometimes fails an observation. Further, this strategy only allows tracking a single kind of exception (in this case, failed observations). This has led some works to conflate observation failure and non-termination [18,34]. We believe there is an important distinction between the two: observation failure means that the program behavior is inconsistent with observed facts, non-termination means that the program did not return a result.

Listing 3 illustrates that it is not possible to condition parts of the program on there being no observation failure. In Listing 3, conditioning the first branch <sup>x</sup> := 0; **observe**(**flip**( <sup>1</sup> <sup>2</sup> )) on there being no observation failure yields P r[x = 0] = 1, rendering the observation irrelevant. The same situation arises for the second branch. Hence, conditioning the two branches in isolation yields P r[x = 0] = <sup>1</sup> <sup>2</sup> instead of P r[x = 0] = <sup>2</sup> 3 .

*Loops.* Listing 4 shows a probabilistic program with a while loop. It samples from the **geometric**( <sup>1</sup> <sup>2</sup> ) distribution, which counts the number of failures (**flip** returns <sup>0</sup>) until the first success occurs (**flip** returns <sup>1</sup>). This program terminates with probability 1, but it is of course possible that a probabilistic program fails to terminate with positive probability. Listing 5 demonstrates this possibility.

Listing 5 modifies x until either x = 0 or x = 10. In each iteration, x is either increased or decreased, each with probability <sup>1</sup> <sup>2</sup> . If x reaches 0, the loop terminates. If x reaches 10, the loop never terminates. By symmetry, both termination and non-termination are equally likely. Hence, the program either returns 0 or does not terminate, each with probability <sup>1</sup> 2 .

may not terminate Other semantics often use sub-probability measures to express non-termination [4,23]. Thus, these semantics would say that Listing 5 results in 0 with probability <sup>1</sup> <sup>2</sup> (and nothing else). We propose to track the probability of non-termination explicitly by an additional state -, just as we track the probability of observation failure (-).

*Partial Functions.* Many functions that are practically useful are only partial (meaning they are not defined for some inputs). Examples include **uniform**(a, b) (undefined for b<a) and √x (undefined for x < 0). Listing 6 shows an example program using √x. Usually, semantics do not explicitly address partial functions [23,24,28,33] or use

```
if flip( 1
         2 ) {
  x:=0;
  observe(flip( 1
                 2 ));
}else{
  x:=1;
  observe(flip( 1
                 4 ));
}
Listing 3. The need for
```
tracking -

```
n:=0;
while !flip( 1
              2 ) {
  n=n+1;
}
return n;
```
Listing 4. Geometric distribution

```
x := 5;
while x>0 {
  if x<10 {
    x+=2*flip( 1
                2 )-1;
  }
}
return x;
```
Listing 5. Program that

```
x:=uniform(-1,1);
x=√x;
return x;
Listing 6. Using par-
tial functions
```
Fig. 1. Visual comparison of the exception handling capabilities of different semantics. For example, is filled in [34] because its semantics can handle non-termination. However, the intersection between and is not filled because [34] cannot distinguish non-termination from observation failure.

partial functions without dealing with failure (e.g. [19] use **Bernoulli**(p) without stating what happens if p /∈ [0, 1]). Most of these languages could use a subprobability distribution that misses weight in the presence of errors (in these languages, this results in conflating errors with non-termination and observation failures).

We introduce a third exception state ⊥ that can be produced when partial functions are evaluated outside of their domain. Thus, Listing 6 results in ⊥ with probability <sup>1</sup> <sup>2</sup> and returns a value from [0, 1] with probability <sup>1</sup> <sup>2</sup> (larger values are more likely). Some previous work uses an error state to capture failing computations, but does not propagate this failure implicitly [34,35]. In particular, if an early expression in a long program may fail evaluating √−<sup>4</sup>, every expression in the program that depends on this failing computation has to check whether an exception has occurred. While it may seem possible to skip the rest of the function in case of a failing computation (by applying the pattern **if** (<sup>x</sup> <sup>=</sup> <sup>⊥</sup>) {**return** ⊥} **else** {rest of function}), this is non-modular and does not address the result of the function being used in other parts of a program.

Although our semantics treat ⊥ and similarly, there is an important distinction between the two: ⊥ means the program terminated due to an error, while means that according to observed evidence, the program did not actually run.

#### 2.2 Interaction of Exception States

Next, we illustrate the interaction of different exception states. We explain how our semantics handles these interactions when compared to existing semantics. Fig. 1 gives an overview of which existing semantics can handle which (interactions of) exceptions. We note that our semantics could easily distinguish more kinds of exceptions, such as division by zero or out of bounds accesses to arrays.

*Non-termination and Observation Failure.* Listing 7 shows a program that has been investigated in [22]. Based on the observations, it only admits a single behavior, namely always sampling x = 0 in the third line. This behavior results in non-termination, but it occurs with probability 0. Hence, the program fails an observation (ending up in state -) with probability 1. If we try to

x:=0; **while** x=0 { x=**flip**( <sup>1</sup> <sup>2</sup> ); **observe**(x=0); }

Listing 7. Mixing loops and observations condition on not failing any observation (by rescaling appropriately), this results in a division by 0, because the probability of not failing any observation is 0.

The semantics of Listing 7 thus only has weight on -, and does not allow conditioning on not failing any observation. This is also the solution that [22] proposes, but in our case, we can formally back up this claim with our semantics.

Other languages handle both non-termination and observation failure by subprobability distributions, which makes it impossible to conclude that the missing weight is due to observation failure (and not due to non-termination) [4,24,34]. The semantics in [28] cannot directly express that the missing weight is due to observation failure (rather, the semantics are undefined due to a division by zero). However, the semantics enables a careful reader to determine that the missing weight is due to observation failure (by investigating the conditional weakest precondition and the conditional weakest liberal precondition). Some other languages can express neither while loops nor observations [23,33,35].

*Assertions and Non-termination.* For some programs, it is useful to check assumptions explicitly. For example, the implementation of the factorial function in Listing 8 explicitly checks whether x is a valid argument to the factorial function. If x /∈ **N**, the program should run into an error (i.e. only have weight on ⊥). If x ∈ **N**, the program should return x! (i.e. only have weight on x!). This example illustrates that earlier exceptions (like failing an assertion) should *bypass* later exceptions (like nontermination, which occurs for x /∈ **N** if the programmer

```
assert(x≥0);
assert(x=x);
fac:=1;
while x=0 {
  fac=fac*x;
  x=x-1;
}
return fac;
```
Listing 8. Explicitly checking assumptions

forgets the first two assertions). This is not surprising, given that this is also the semantics of exceptions in most deterministic languages. Most existing semantics either cannot express Listing 8 ([23,34] have no assertions, [35] has no iteration) or cannot distinguish failing an assertion from non-termination [24,28,33]. The consequence of the latter is that removing the first two assertions from Listing 8 does not affect the semantics. Handling assertion failure by sum types (as e.g. in [34]) could be a solution, but would force the programmer to deal with assertion failure explicitly. Only the semantics in [4] has the expressiveness to implicitly handle assertion errors in Listing 8 without conflating those errors with non-termination.

Listing 9 shows a different interaction between nontermination and failing assertions. Here, even though the loop condition is always true, the first iteration of the loop will run into an exception. Thus, Listing 9 results in ⊥ with probability 1. Again, this behavior should not be surprising given the behavior of deterministic languages. For

x:=0; **while** 1 { x=x/x; }

Listing 9. Guaranteed failure

Listing 9, conflating errors with non-termination means the program semantics cannot express that the missing weight is due to an error and not due to nontermination.

*Observation Failure and Assertion Failure.* In our PPL, earlier exceptions bypass later exceptions, as illustrated in Listing 8. However, because we are operating in a probabilistic language, exceptions can occur probabilistically. Listing 10 shows a program that may run into

**observe**(**flip**( <sup>1</sup> <sup>2</sup> )); **assert**(**flip**( <sup>1</sup> <sup>2</sup> ));

Listing 10. Observation or assertion failure

an observation failure, or into an assertion failure, or neither. If it runs into an observation failure (with probability <sup>1</sup> <sup>2</sup> ), it bypasses the rest of the program, resulting in with probability <sup>1</sup> <sup>2</sup> and in <sup>⊥</sup> with probability <sup>1</sup> <sup>4</sup> . Conditioning on the absence of observation failures, the probability of <sup>⊥</sup> is <sup>1</sup> 2 .

An important observation is that reordering the two statements of Listing 10 will result in a different behavior. This is the case, even though there is no obvious data-flow between the two statements. This is in sharp contrast to the semantics in [34], which guarantee (in the absence of exceptions) that only data flow is relevant and that expressions can be reordered. Our semantics illustrate that even if there is no explicit data-dependency, some seemingly obvious properties (like commutativity) may not hold in the presence of exceptions. Some languages either cannot express Listing 10 ([23,33] lack observations), cannot distinguish observation failure from assertion failure [24] or cannot handle exceptions implicitly [34,35].

*Summary.* In this section, we showed examples of probabilistic programs that exhibit non-termination, observation failures and errors. Then, we provided examples that show how these exceptions can interact, and explained how existing semantics handle these interactions.

### 3 Preliminaries

In this section, we provide the necessary theory. Most of the material is standard, however, our treatment of exception states is interesting and important for providing semantics to probabilistic programs in the presence of exceptions. All key lemmas (together with additional definitions and examples) are proven in Appendix A.

*Natural Numbers,* [n]*, Iverson Brackets, Restriction of Functions.* We include 0 in the natural numbers, so that **N** := {0, 1,... }. For n ∈ **N**, [n] := {1,...,n}. The *Iverson brackets* [·] are defined by [b]=1 if b is true and [b]=0 if b is false. A particular application of the Iverson brackets is to characterize the indicator function of a specific set S by [x ∈ S]. For a function f : X → Y and a subset of the domain <sup>S</sup> <sup>⊆</sup> <sup>X</sup>, <sup>f</sup> restricted to <sup>S</sup> is denoted by <sup>f</sup><sup>|</sup>S : <sup>S</sup> <sup>→</sup> <sup>Y</sup> .

*Set of Variables, Generating Tuples, Preservation of Properties, Singleton Set.* Let Vars be a set of admissible variable names. We refer to the elements of Vars by x, y, z and <sup>x</sup>i, yi, zi, vi, wi, for <sup>i</sup> <sup>∈</sup> **<sup>N</sup>**. For <sup>v</sup> <sup>∈</sup> <sup>A</sup> and <sup>n</sup> <sup>∈</sup> **<sup>N</sup>**, <sup>v</sup>!<sup>n</sup> := (v,..., v) <sup>∈</sup> <sup>A</sup><sup>n</sup> denotes the tuple containing n copies of v. A function f : A<sup>n</sup> → A *preserves a property* if whenever <sup>a</sup>1,...,an <sup>∈</sup> <sup>A</sup> have that property, <sup>f</sup>(a1,...,an) <sup>∈</sup> <sup>A</sup> has that property. Let **1** denote the set which only contains the empty tuple (), i.e. **1** := {()}. For sets of tuples S ⊆ <sup>n</sup> i=1 <sup>A</sup>i, there is an isomorphism <sup>S</sup> <sup>×</sup> **<sup>1</sup> 1** × S S. This isomorphism is intuitive and we sometimes silently apply it.

*Exception States, Lifting Functions to Exception States.* We allow the extension of sets with some symbols that stand for the occurrence of special events in a program. This is important because it allows us to capture the event that a given program runs into specific exceptions. Let X := {⊥, -, -} be a (countable) set of exception states. We denote by A := A ∪ X the set A extended with X (we require that A ∩ X = ∅). Intuitively, ⊥ corresponds to assertion failures, corresponds to observation failures and corresponds to non-termination. For a function f : A → B, *f lifted to exception states*, denoted by f : A → B is defined by f(a) = a if a ∈ X and f(a) = f(a) if a /∈ X . For a function f : <sup>n</sup> i=1 <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>B</sup>, *f lifted to exception states*, denoted by <sup>f</sup> : <sup>n</sup> i=1 <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>B</sup>, propagates the first exception in its arguments, or evaluates f if none of its arguments are exceptions. Formally, it is defined by <sup>f</sup>(a1,...,an) = <sup>a</sup><sup>1</sup> if <sup>a</sup><sup>1</sup> ∈ X , <sup>f</sup>(a1,...,an) = <sup>a</sup><sup>2</sup> if <sup>a</sup><sup>1</sup> ∈ X / and <sup>a</sup><sup>2</sup> ∈ X , and so on. Only if <sup>a</sup>1,...,an ∈ X / , we have <sup>f</sup>(a1,...,an) = <sup>f</sup>(a1,...,an). Thus, <sup>f</sup>(-, a, ⊥) =-. In particular, we write (a, b) for lifting the tupling function, resulting in for example (-, -) = -. To remove notation clutter, we do not distinguish the two different liftings f : A → B and f : <sup>n</sup> i=1 <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>B</sup> notationally. Whenever we write <sup>f</sup>, it will be clear from the context which lifting we mean. We write S×T for {(s, t) | s ∈ S, t ∈ T}.

*Records.* A *record* is a special type of tuple indexed by variable names. For sets (Si)i∈[n], a record <sup>r</sup> <sup>∈</sup> <sup>n</sup> i=1(x<sup>i</sup> : <sup>S</sup>i) has the form <sup>r</sup> <sup>=</sup> {x<sup>1</sup> → <sup>v</sup>1,...,x<sup>n</sup> → <sup>v</sup>n}, where <sup>v</sup>i <sup>∈</sup> <sup>S</sup>i, with the convenient shorthand <sup>r</sup> <sup>=</sup> {xi → <sup>v</sup>i}i∈[n]. We can access the elements of a record by their name: <sup>r</sup>[xi] = <sup>v</sup>i.

In what follows, we provide the measure theoretic background necessary to express our semantics.

σ*-algebra, Measurable Set,* σ*-algebra Generated by a Set, Measurable Space, Measurable Functions.* Let <sup>A</sup> be some set. A set <sup>Σ</sup>A ⊆ P (A) is called a <sup>σ</sup>*-algebra on A* if it satisfies three conditions: <sup>A</sup> <sup>∈</sup> <sup>Σ</sup>A, <sup>Σ</sup>A is closed under complements (<sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A implies <sup>A</sup>\<sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A) and <sup>Σ</sup>A is closed under countable unions (for any collection {Si}i∈**<sup>N</sup>** with <sup>S</sup>i <sup>∈</sup> <sup>Σ</sup>A, we have i∈**<sup>N</sup>** <sup>S</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup>A). The elements of <sup>Σ</sup><sup>A</sup> are called *measurable sets*. For any set A, a trivial σ-algebra on A is its power set P (A). Unfortunately, the power set often contains sets that do not behave well. To come up with a σ-algebra on A whose sets do behave well, we often start with a set S ⊆ P (A) that is not a σ-algebra and extend it until we get a σ-algebra. For this purpose, let A be some set and S ⊆ P (A) a collection of subsets of A. The σ*-algebra generated by S* denoted by σ(S) is the smallest σ-algebra that contains S. Formally, σ(S) is the intersection of all σ-algebras on A containing <sup>S</sup>. For a set <sup>A</sup> and a <sup>σ</sup>-algebra <sup>Σ</sup>A on <sup>A</sup>, (A, ΣA) is called a *measurable space*. We often leave <sup>Σ</sup>A implicit; whenever it is not mentioned explicitly, it is clear from the context. Table 1 provides the implicit σ-algebras for some common sets. As an example, some elements of Σ**<sup>R</sup>** include [0, 1] ∪ {⊥} and {1, 3, π}. For measurable spaces (A, ΣA) and (B,ΣB), a function <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> is called *measurable*,


Table 1. Implicit <sup>σ</sup>-algebras on common sets, for measurable spaces (A, Σ*A*), (A*i*, Σ*<sup>A</sup>*<sup>i</sup> )

if <sup>∀</sup><sup>S</sup> <sup>∈</sup> <sup>Σ</sup>B : <sup>f</sup> <sup>−</sup>1(S) <sup>∈</sup> <sup>Σ</sup>A. Here, <sup>f</sup> <sup>−</sup>1(S) := {<sup>a</sup> <sup>∈</sup> <sup>A</sup>: <sup>f</sup>(a) <sup>∈</sup> <sup>S</sup>}. If one is familiar with the notion of Lebesgue measurable functions, note that our definition does not include all Lebesgue measurable functions. As a motivation to why we need measurable functions, consider the following scenario. We know the distribution of some variable x, and want to know the distribution of y = f(x). To figure out how likely it is that y ∈ S for a measurable set S, we can determine how likely it is that <sup>x</sup> <sup>∈</sup> <sup>f</sup> <sup>−</sup>1(S), because <sup>f</sup> <sup>−</sup>1(S) is guaranteed to be a measurable set.

*Measures, Examples of Measures.* For a measurable space (A, ΣA), a function <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] is called a *measure on A* if it satisfies two properties: null empty set (μ(∅)=0) and countable additivity (for any countable collection {Si}i∈I of pairwise disjoint sets <sup>S</sup>i <sup>∈</sup> <sup>Σ</sup>A, we have <sup>μ</sup> i∈I <sup>S</sup><sup>i</sup> = - i∈I <sup>μ</sup>(Si)). Measures allow us to quantify the probability that a certain result lies in a measurable set. For example, μ([1, 2]) can be interpreted as the probability that the outcome of a process is between 1 and 2.

The *Lebesgue measure* λ: B → [0,∞] is the (unique) measure that satisfies <sup>λ</sup>([a, b]) = <sup>b</sup> <sup>−</sup> <sup>a</sup> for all a, b <sup>∈</sup> **<sup>R</sup>** with <sup>a</sup> <sup>≤</sup> <sup>b</sup>. The *zero measure* **<sup>0</sup>**: <sup>Σ</sup>A <sup>→</sup> [0,∞] is defined by **<sup>0</sup>**(S)=0 for all <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A. For a measurable space (A, ΣA) and some <sup>a</sup> <sup>∈</sup> <sup>A</sup>, the *Dirac measure* <sup>δ</sup>a : <sup>Σ</sup>A <sup>→</sup> [0,∞] is defined by <sup>δ</sup>a(S)=[<sup>a</sup> <sup>∈</sup> <sup>S</sup>].

Unfortunately, there are measures that do not satisfy some important properties (for example, they may not satisfy Fubini's theorem, which we discuss later on). The usual way to deal with this is to restrict our attention to σ-finite measures, which are well-known and were studied in great detail. However, σ-finite measures are too restrictive for our purposes. In particular, the s-finite kernels that we introduce later on can induce measures that are not σ-finite. This is why in the following, we work with s-finite measures. Table 2 gives an overview of the different kinds of measures that are important for understanding our work. The expression 1/2 · δ<sup>1</sup> stands for the pointwise multiplication of the measure δ<sup>1</sup> by 1/2: 1/2 · δ<sup>1</sup> = λS. 1/2 · δ1(S). Here, the λ refers to λ-abstraction and not to the Lebesgue measure. To distinguish the two λs, we always write "λx." (with a dot) when we refer to λ-abstraction. For more details on the definitions and for proofs about the provided examples, see Appendix A.1.

Table 2. Definition and comparison of different measures <sup>μ</sup>: <sup>Σ</sup>*<sup>A</sup>* <sup>→</sup> [0, <sup>∞</sup>] on measurable spaces (A, Σ*A*). Reading the table top-down, we get from the most restrictive definition to the most permissive definition. For example, any sub-probability measure is also a σ-finite measure. We also provide an example for each type of measure that is not an example of the more restrictive type of measure. For example, the Lebesgue measure λ is σ-finite but not s-finite.


*Product of Measures, Product of Measures in the Presence of Exception States.* For s-finite measures <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] and <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞], we denote the *product of measures* by <sup>μ</sup> <sup>×</sup> <sup>μ</sup> : <sup>Σ</sup>A×B <sup>→</sup> [0,∞], and define it by

$$(\mu \times \mu')(S) = \int\_{a \in A} \int\_{b \in B} [(a, b) \in S] \mu'(db) \mu(da)$$

For s-finite measures μ: ΣA <sup>→</sup> [0,∞] and <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞], we denote the *lifted product of measures* by <sup>μ</sup>×μ : <sup>Σ</sup>A×B <sup>→</sup> [0,∞] and define it using the lifted tupling function: (μ×μ )(S) = a∈A b∈B[(a, b) <sup>∈</sup> <sup>S</sup>]μ (db)μ(da). While the product of measures μ × μ is well known for combining two measures to a joint measure, the concept of a lifted product of measures μ×μ is required to do the same for combining measures that have weight on exception states. Because the formal semantics of our probabilistic programming language makes use of exception states, we always use × to combine measures, appropriately handling exception states implicitly.

Lemma 1. *For measures* <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞]*,* <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞]*, let* <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A *and* <sup>T</sup> <sup>∈</sup> <sup>Σ</sup>B*. Then,* (<sup>μ</sup> <sup>×</sup> <sup>μ</sup> )(S × T) = μ(S) · μ (T)*.*

For μ: ΣA <sup>→</sup> [0,∞], <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞] and <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A, <sup>T</sup> <sup>∈</sup> <sup>Σ</sup>B, in general we have (μ×μ )(S × T) = μ(S) · μ (T), due to interactions of exception states.

Lemma 2. × *and* × *for s-finite measures are associative, left- and right-distributive and preserve (sub-)probability and s-finite measures.*

*Lebesgue Integrals, Fubini's Theorem for s-finite Measures.* Our definition of the Lebesgue integral is based on [31]. It allows integrating functions that sometimes evaluate to ∞, and Lebesgue integrals evaluating to ∞.

Here, (A, ΣA) and (B,ΣB) are measurable spaces and <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] and <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞] are measures on <sup>A</sup> and <sup>B</sup>, respectively. Also, <sup>E</sup> <sup>∈</sup> <sup>Σ</sup>A and <sup>F</sup> <sup>∈</sup> <sup>Σ</sup>B. Let <sup>s</sup>: <sup>A</sup> <sup>→</sup> [0,∞) be a measurable function. <sup>s</sup> is a *simple function* if s(x) = n i=1 <sup>α</sup>i[<sup>x</sup> <sup>∈</sup> <sup>A</sup>i] for <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> and <sup>α</sup><sup>i</sup> <sup>∈</sup> **<sup>R</sup>**. For any simple function <sup>s</sup>, the Lebesgue integral of s over E with respect to μ, denoted by a∈E <sup>s</sup>(a)μ(da), is defined by n i=1 <sup>α</sup><sup>i</sup> · <sup>μ</sup>(A<sup>i</sup> <sup>∩</sup> <sup>E</sup>), making use of the convention <sup>0</sup> · ∞ = 0. Let f : A → [0,∞] be measurable but not necessarily simple. Then, the *Lebesgue integral* of f over E with respect to μ is defined by

$$\int\_{a \in E} f(a)\mu(da) := \sup \left\{ \int\_{a \in E} s(a)\mu(da) \, \middle| \, s: A \to [0, \infty) \text{ is simple}, 0 \le s \le f \right\}$$

Here, the inequalities on functions are pointwise. Appendix A.2 lists some useful properties of the Lebesgue integral. Here, we only mention Fubini's theorem, which is important because it entails a commutativity-like property of the product of measures: (μ × μ )(S)=(μ × μ)(swap(S)), where swap switches the dimensions of S: swap(S) = {(b, a) | (a, b) ∈ S}. The proof of this property is straightforward, by expanding the definition of the product of measures and applying Fubini's theorem. As we show in Sect. 5, this property is crucial for the commutativity of expressions. In the presence of exceptions, it does not hold: (μ×μ )(S) = (μ ×μ)(swap(S)) in general.

Theorem 1 (Fubini's theorem). *For s-finite measures* <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] *and* <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞] *and any measurable function* <sup>f</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> [0,∞]*,*

$$\int\_{a \in A} \int\_{b \in B} f(a, b) \mu'(db) \mu(da) = \int\_{b \in B} \int\_{a \in A} f(a, b) \mu(da) \mu'(db)$$

*For s-finite measures* μ: ΣA <sup>→</sup> [0,∞] *and* <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞] *and any measurable function* f : A × B → [0,∞]*,*

$$\int\_{a \in \overline{A}} \int\_{b \in \overline{B}} \overline{f}(a, b) \mu'(db) \mu(da) = \int\_{b \in \overline{B}} \int\_{a \in \overline{A}} \overline{f}(a, b) \mu(da) \mu'(db)$$

*(Sub-)probability Kernels, s-finite Kernels, Dirac Delta, Lebesgue Kernel, Motivation for s-finite Kernels.* In the following, let (A, ΣA) and (B,ΣB) be measurable spaces. A *(sub-)probability kernel with source A and target B* is a function <sup>κ</sup>: <sup>A</sup> <sup>×</sup> <sup>Σ</sup>B <sup>→</sup> [0,∞) such that for all <sup>a</sup> <sup>∈</sup> <sup>A</sup>: <sup>κ</sup>(a, ·): <sup>Σ</sup>B <sup>→</sup> [0,∞) is a (sub-)probability measure, and <sup>∀</sup><sup>S</sup> <sup>∈</sup> <sup>Σ</sup>B : <sup>κ</sup>(·, S): <sup>A</sup> <sup>→</sup> [0,∞) is measurable. <sup>κ</sup>: <sup>A</sup> <sup>×</sup> <sup>Σ</sup>B <sup>→</sup> [0,∞] is an *s-finite kernel with source A and target B* if <sup>κ</sup> is a pointwise sum of sub-probability kernels <sup>κ</sup>i : <sup>A</sup> <sup>×</sup> <sup>Σ</sup>B <sup>→</sup> [0,∞), meaning κ = - i∈**<sup>N</sup>** <sup>κ</sup>i. We denote the set of s-finite kernels with source <sup>A</sup> and target <sup>B</sup> by <sup>A</sup> → <sup>B</sup> <sup>⊆</sup> <sup>A</sup> <sup>×</sup> <sup>Σ</sup>B <sup>→</sup> [0,∞]. Because we only ever deal with s-finite kernels, we often refer to them simply as kernels.

We can understand the Dirac measure as a probability kernel. For a measurable space (A, ΣA), the *Dirac delta* <sup>δ</sup> : <sup>A</sup> → <sup>A</sup> is defined by <sup>δ</sup>(a, S)=[<sup>a</sup> <sup>∈</sup> <sup>S</sup>]. Note that for any <sup>a</sup>, <sup>δ</sup>(a, ·): <sup>Σ</sup>A <sup>→</sup> [0,∞] is the Dirac measure. We often write <sup>δ</sup>(a)(S) or <sup>δ</sup>a(S) for <sup>δ</sup>(a, S). Note that we can also interpret <sup>δ</sup> : <sup>A</sup> → <sup>A</sup> as an s-finite kernel from A → B for A ⊆ B. The *Lebesgue kernel* λ<sup>∗</sup> : A → **R** is defined by λ∗(a)(S) = λ(S), where λ is the Lebesgue measure. The definition of s-finite kernels is a lifting of the notion of s-finite measures. Note that for an s-finite kernel κ, κ(a, ·) is an s-finite measure for all a ∈ A. In the context of probabilistic programming, s-finite kernels have been used before [34].

Working in the space of sub-probability kernels is inconvenient, because, for example, λ<sup>∗</sup> : **R** → **R** is not a sub-probability kernel. Even though λ∗(x) is σfinite measure for all x ∈ **R**, not all s-finite kernels induce σ-finite measures in this sense. As an example, (λ∗;λ∗)(x) is not a σ-finite measure for any x ∈ **R** (see Lemma 15 in Appendix A.1). We introduce (;) shortly in Definition 1.

Working in the space of s-finite kernels is convenient because s-finite kernels have many nice properties. In particular, the set of s-finite kernels A → B is the smallest set that contains all sub-probability kernels with source A and target B and is closed under countable sums.

*Lifting Kernels to Exception States, Removing Weight from Exception States.* For kernels κ: A → B or kernels κ: A → B, κ *lifted to exception states* κ: A → B is defined by κ(a) = κ(a) if a ∈ A and κ(a) = δ(a) if a /∈ A. When transforming κ into κ, we preserve (sub-)probability and s-finite kernels.

*Composing kernels, composing kernels in the presence of exception states.*

Definition 1. *Let* (;): (A → B) → (B → C) → (A → C) *be defined by* (f;g)(a)(S) = b∈B <sup>g</sup>(b)(S) <sup>f</sup>(a)(db)*.*

Note that f;g intuitively corresponds to first applying f and then g. Throughout this paper, we mostly use >=> instead of (;), but we introduce (;) because it is well-known and it is instructive to show how our definition of >=> relates to (;).

Lemma 3. (;) *is associative, left- and right-distributive, has neutral element*<sup>2</sup> δ *and preserves (sub-)probability and s-finite kernels.*

Definition 2. *Let* (>=>): (A → B) → (B → C) → (A → C) *be defined by* (f >=> g)(a)(S) = b∈B <sup>g</sup>(b)(S) <sup>f</sup>(a)(db)*.*

We sometimes write f(a) = g for (f >=> g)(a).

Lemma 4. *For* <sup>f</sup> : <sup>A</sup> → <sup>B</sup> *and* <sup>g</sup> : <sup>B</sup> → <sup>C</sup>*,* <sup>a</sup> <sup>∈</sup> <sup>A</sup> *and* <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>C *,*

$$(f \succ \succ g)(a)(S) = (f;g)(a)(S) + \sum\_{x \in \mathcal{X}} \delta(x)(S)f(a)(\{x\})$$

Lemma 4 shows how >=> relates to (;), by splitting f >=> g into nonexceptional behavior of f (handled by (;)) and exceptional behavior of f (handled by a sum). Intuitively, if f produces an exception state ∈ X , then g is not even evaluated. Instead, this exception is directly passed on, as indicated by δ(x)(S).

<sup>2</sup> δ is a neutral element of (;) if (δ;κ)=(κ;δ) = κ for all kernels κ.

If f(a)(X )=0 for all a ∈ A, or if S ∩ X = ∅, then the definitions are equivalent in the sense that (f;g)(a)(S)=(f >=> g)(a)(S). The difference between >=> and (;) is the treatment of exception states produced by f. Note that technically, the target B of f : A → B does not match the source B of g : B → C. Therefore, to formally interpret <sup>f</sup>;g, we silently restrict the domain of <sup>f</sup> to <sup>A</sup> <sup>×</sup> <sup>Σ</sup>B.

Lemma 5. >=> *is associative, left-distributive (but not right-distributive), has neutral element* δ *and preserves (sub-)probability and s-finite kernels.*

*Product of Kernels, Product of Kernels in the Presence of Exception States.* For s-finite kernels κ: A → B, κ : A → C, we define the *product of kernels*, denoted by κ × κ : A → B × C, as (κ × κ )(a)(S)=(κ(a) × κ (a))(S). For s-finite kernels κ: A → B and κ : A → C, we define the *lifted product of kernels*, denoted by κ×κ : A → B × C, as (κ×κ )(a)(S)=(κ(a)×κ (a))(S). × and × allow us to combine kernels to a joint kernel. Essentially, this definition reduces the product of kernels to the product of measures.

Lemma 6. × *and* × *for kernels preserve (sub-)probability and s-finite kernels, are associative, left- and right-distributive.*

*Binding Conventions.* To avoid too many parentheses, we make use of some binding conventions, ordering (in decreasing binding strength) ×, ×, ;, >=>, +.

*Summary.* The most important concepts introduced in this section are exception states, records, Lebesgue integration, Fubini's theorem and (s-finite) kernels.

### 4 A Probabilistic Language and Its Semantics

We now describe our probabilistic programming language, the typing rules and the denotational semantics of our language.

### 4.1 Syntax

Let **V** := **Q** ∪ {π, e} ⊆ **R** be a (countable) set of constants expressible in our programs. Let i, n ∈ **N**, r ∈ **V**, x ∈ Vars, a generic unary operator (e.g., − inverts the sign of a value, ! is logical negation mapping 0 to 1 and all other numbers to 0, · and · round down and up respectively), ⊕ a generic binary operator (e.g., +, −, ∗, /, <sup>∧</sup> for addition, subtraction, multiplication, division and exponentiation, &&, || for logical conjunction and disjunction, =, =, <, ≤, >, ≥ to compare values). Let f : A → **R** → [0,∞) be a measurable function that maps a ∈ A to a probability density function. We check if f is measurable by uncurrying f to f : A × **R** → [0,∞). Fig. 2 shows the syntax of our language.

Our expressions capture () (the only element of **1**), r (real numbers), x (variables), (e1,...,en) (tuples), <sup>e</sup>[i] (accessing elements of tuples for <sup>i</sup> <sup>∈</sup> **<sup>N</sup>**), <sup>e</sup> (unary operators), e<sup>1</sup> ⊕ e<sup>2</sup> (binary operators), e1[e2] (accessing array elements), <sup>e</sup>1[e<sup>2</sup> → <sup>e</sup>3] (updating array elements), **array**(e1, e2) (creating array of length <sup>e</sup><sup>1</sup>

$$e \implies ()\mid r \mid x \mid (e\_1, \ldots, e\_n) \mid e[i] \mid \in e \mid e\_1 \oplus e\_2 \mid e\_1[e\_2] \mid \tag{\textbf{Expressions}}$$

$$\mathtt{assert}(e\_1, e\_2) \mid e\_1[e\_2 \mapsto e\_3] \mid F(e)$$

$$F \implies \lambda x.\{P; \mathtt{return} \; e;\} \mid \mathtt{`\mathtt{List} \; \mathtt{//} \mathtt{while} \, \mathtt{m} \, \mathtt{//} \mathtt{same} \, \mathtt{IoFron} \} \tag{\textbf{Functions}}$$

$$P \implies \mathtt{skip} \mid x := e \mid x = e \mid P\_1; P\_2 \mid \mathtt{if} \; e \; \{P\_1\} \; \mathtt{else} \; \{P\_2\} \mid \{P\} \mid \quad \text{(\textbf{Statements})}$$

$$\mathtt{assert}(e) \mid \mathtt{observe}(e) \mid \mathtt{while} \; e \; \{P\}$$

Fig. 2. The syntax of our probabilistic language.

containing e<sup>2</sup> at every index) and F(e) (evaluating function F on argument e). To handle functions <sup>F</sup>(e1,...,en) with multiple arguments, we interpret (e1,...,en) as a tuple and apply F to that tuple.

Our functions express λx.{P; **return** <sup>e</sup>; } (function taking argument <sup>x</sup> running <sup>P</sup> on <sup>x</sup> and returning <sup>e</sup>), **flip**(e) (random choice from {0, <sup>1</sup>}, <sup>1</sup> with probability <sup>e</sup>), **uniform**(e1, e2) (continuous uniform distribution between <sup>e</sup><sup>1</sup> and <sup>e</sup>2) and **sampleFrom**f (e) (sample value distributed according to probability density function f(e)). An example for f is the density of the exponential distribution, indexed with rate λ. Formally, f : (0,∞) → **R** → [0,∞) is defined by f(λ)(x) = λe<sup>−</sup>λx if x ≥ 0 and f(λ)(x)=0 otherwise. Often, f is partial (e.g., λ ≤ 0 is not allowed). Intuitively, arguments outside the allowed range of f produce the error state ⊥.

Our statements express **skip** (no operation), <sup>x</sup> := <sup>e</sup> (assigning to a fresh variable), x = e (assigning to an existing variable), P1; P<sup>2</sup> (sequential composition of programs), **if** <sup>e</sup> {P1} **else** {P2} (if-then-else), {P} (static scoping), **assert**(e) (asserting that an expression evaluates to true, assertion failure results in <sup>⊥</sup>), **observe**(e) (observing that an expression evaluates to true, observation failure results in -) and **while** <sup>e</sup> {P} (while loops, non-termination results in -). We additionally introduce syntactic sugar e1[e2] = e<sup>3</sup> for <sup>e</sup><sup>1</sup> <sup>=</sup> <sup>e</sup>1[e<sup>2</sup> → <sup>e</sup>3], **if** (e) {P} for **if** <sup>e</sup> {P} **else** {**skip**} and func(e2) for λx.{P; **return** <sup>e</sup>1; }(e2) (using the name func for the function with argument <sup>x</sup> and body {P; **return** <sup>e</sup>1}).

### 4.2 Typing Judgments

Let n ∈ **N**. We define types by the following grammar in BNF, where τ [] denotes arrays over type τ . We sometimes write <sup>n</sup> i=1 <sup>τ</sup><sup>i</sup> for the product type <sup>τ</sup>1×···×τn.

$$\tau ::= \mathbb{1} \mid \mathbb{R} \mid \tau \lbrack \ | \ \tau\_1 \times \dots \times \tau\_n \rbrack$$

Note that we also use the type τ<sup>1</sup> → τ<sup>2</sup> of kernels with source τ<sup>1</sup> and target τ2, but we do not list it here to avoid higher-order functions (discussed in Sect. 4.5).

Formally, a *context* <sup>Γ</sup> is a set {x<sup>i</sup> : <sup>τ</sup>i}i∈[n] that assigns a type <sup>τ</sup><sup>i</sup> to each variable <sup>x</sup>i <sup>∈</sup> Vars. In slight abuse of notation, we sometimes write <sup>x</sup> <sup>∈</sup> <sup>Γ</sup> if there is a type τ with x: τ ∈ Γ. We also write Γ, x: τ for Γ ∪ {x: τ} (where x /∈ Γ) and Γ, Γ for Γ ∪ Γ (where Γ and Γ have no common variables).

Fig. 3. The typing rules for expressions and functions in our language

Fig. 4. The typing rules for statements

The rules in Figs. 3 and 4 allow deriving the type of expressions, functions and statements. To state that an expression e is of type τ under a context Γ, we write Γ e: τ . Likewise, F : τ → τ indicates that F is a kernel from τ to τ . Finally, Γ <sup>P</sup> Γ states that a context Γ is transformed to Γ by a statement <sup>P</sup>. For **sampleFrom**f , we intuitively want <sup>f</sup> to map values from <sup>τ</sup> to probability density functions. To allow f to be partial, i.e., to be undefined for some values from <sup>τ</sup> , we use <sup>A</sup> <sup>∈</sup> <sup>Σ</sup>τ (and hence <sup>A</sup> <sup>⊆</sup> [[<sup>τ</sup> ]]) as the domain of <sup>f</sup> (see Sect. 4.3).

#### 4.3 Semantics

*Semantic Domains.* We assign to each type τ a set [[τ ]] together with an implicit <sup>σ</sup>-algebra <sup>Σ</sup>τ on that set. Additionally, we assign a set [[Γ]] to each context <sup>Γ</sup> <sup>=</sup> {xi : <sup>τ</sup>i}i∈[n]. Concretely, we have [[**1**]] = **<sup>1</sup>** := {()} with <sup>Σ</sup>**<sup>1</sup>** <sup>=</sup> {∅,()}, [[**R**]] = **R** and Σ**<sup>R</sup>** = B. The remaining semantic domains are outlined in Fig. 5.

$$\begin{aligned} \left[\tau[\ ]\right] &= \bigcup\_{i \in \mathbb{N}} \left[\tau\ \right]^i & \Sigma\_{\tau[\ ]} \text{ is generated by } \bigcup\_{i \in \mathbb{N}} \left\{ \prod\_{j=1}^i S\_j \; \middle| \; S\_j \in \Sigma\_{\tau} \right\} \\ \left[\tau\_1 \ \times \cdots \times \tau\_n\right] &= \prod\_{i=1}^n \left[\tau\_i\right] & \Sigma\_{\tau\_1 \times \cdots \times \tau\_n} \text{ is generated by } \left\{ \prod\_{i=1}^n S\_i \; \middle| \; S\_i \in \Sigma\_{\tau\_i} \right\} \\ \left[\Gamma\right] &= \prod\_{i=1}^n \left(x\_i \colon \left\|\tau\_i\right\|\right) & \Sigma\_{\Gamma} \text{ is generated by } \left\{ \prod\_{i=1}^n \left(x\_i \colon S\_i\right) \; \middle| \; S\_i \in \Sigma\_{\tau\_i} \right\} \end{aligned}$$

### Fig. 5. Semantic domains for types

Fig. 6. The semantics of expressions. <sup>v</sup>!<sup>n</sup> stands for the <sup>n</sup>-tuple (v,..., v). <sup>t</sup>[i] stands for the i-th element (0-indexed) of the tuple t and t[i → v] is the tuple t, where the i-th element is replaced by v. |t| is the length of a tuple t. σ stands for a program state over all variables in some Γ, with σ ∈ [[Γ]].

*Expressions.* Fig. 6 assigns to each expression e typed by Γ e: τ a probability kernel [[e]]τ : [[Γ]] → [[<sup>τ</sup> ]]. When <sup>τ</sup> is irrelevant or clear from the context, we may drop it and write [[e]]. The formal interpretation of [[Γ]] → [[τ ]] is explained in Sect. 3. <sup>3</sup> Note that Fig. 6 is incomplete, but extending it is straightforward. When we need to evaluate multiple terms (as in (e1,...,en)), we combine the results using ×. This makes sure that in the presence of exceptions, the first exception that occurs will have priority over later exceptions. In addition, deterministic functions (like x+y) are lifted to probabilistic functions by the Dirac delta (e.g. δ(x+y)) and incomplete functions (like x/y) are lifted to complete functions via the explicit error state ⊥.

<sup>3</sup> As a quick and intuitive reminder, <sup>κ</sup>: <sup>A</sup> <sup>→</sup> <sup>B</sup> means that for every <sup>a</sup> <sup>∈</sup> <sup>A</sup>, <sup>κ</sup>(a) will be a distribution over B, where B is B enriched with exception states. Hence, κ(a) may have weight on elements of B, on exception states, or on both.

Fig. 7 assigns to each function F typed by F : τ<sup>1</sup> → τ<sup>2</sup> a probability kernel [[F]]τ1 →τ<sup>2</sup> : [[τ1]] → [[τ2]]. In the semantics of **flip**, <sup>δ</sup>(1): <sup>Σ</sup>**<sup>R</sup>** <sup>→</sup> [0,∞] is a measure on **R**, and p · δ(1) rescales this measure pointwise. Similarly, the sum p · δ(1) + (1 − p)· δ(0) is also meant pointwise, resulting in a measure on **R**. Finally, λp. p · <sup>δ</sup>(1)+ (1−p)·δ(0) is a kernel with source [0, 1] and target **<sup>R</sup>**. For **sampleFrom**f (e), remember that f(p)(·) is a probability density function.

$$\begin{aligned} [\mathsf{f1.ip}]\_{\mathbb{R}\rightarrow\mathbb{R}} &= \lambda p. \begin{cases} p \cdot \delta(1) + (1-p) \cdot \delta(0) & p \in [0,1] \\ \delta(\bot) & \text{otherwise} \end{cases} \\ [\mathsf{unifor}]\_{\mathbb{R}\rightarrow\mathbb{R}} &= \lambda(l,r). \begin{cases} \lambda S. \frac{1}{r-l} \lambda([l,r] \cap S) & l < r \\ \delta(\bot) & \text{otherwise} \end{cases} \\ [\mathsf{sameloFrom}\_{f}]\_{\mathbb{T}\rightarrow\mathbb{R}} &= \lambda p. \begin{cases} \lambda S. \int\_{x\in\mathbb{R}\cap S} f(p)(x) \lambda(dx) & p \in A \\ \delta(\bot) & p \notin A \end{cases} \\ [\lambda x.\{P; \mathsf{return} \ e\_{i}\}]\_{\mathbb{T}\_{1}\rightarrow\mathbb{T}\_{2}} &= \lambda v. \delta\left(\{x \mapsto v\}\right) \geqslant = \left\{ \left[P\right] \geqslant = \left[e\_{2}\right]\_{\mathbb{T}\_{2}} \right\} \end{aligned}$$

Fig. 7. The semantics of functions.

$$\begin{aligned} \left[\mathsf{skLip}\right] = \delta & \quad \left[x:=e\right] = \left[x=e\right] = \delta \overline{\times} \left[e\right] & \geqslant \lambda(\sigma, v). \delta(\sigma[x \mapsto v]) \\\\ \left[P\_1; P\_2\right] = \left[P\_1\right] & \geqslant \gg \left[P\_2\right] \quad \quad \quad \left[\left\{P\right\}\right] = \left[P\right] & \geqslant \geqslant \lambda \sigma'. \delta(\sigma'(I)) \\\\ \left[\mathsf{i}\ \mathsf{f}\ \ e\left\{P\_1\right\}\ \mathsf{else}\ \left\{P\_2\right\}\right] &= \delta \overline{\times} \left[e\right]\_{\mathbb{R}} \geqslant \geqslant \lambda(\sigma, b). \begin{cases} \left[\left[P\_1\right]\right](\sigma) & b \neq 0 \\ \left[\left[P\_2\right]\right](\sigma) & b = 0 \end{cases} \\\\ \left[\mathsf{asser}\,\mathsf{r}(e)\right] &= \delta \overline{\times} \left[e\right]\_{\mathbb{R}} \geqslant \geqslant \lambda(\sigma, b). \begin{cases} \delta(\sigma) & b \neq 0 \\ \delta(\bot) & b = 0 \end{cases} \\\\ \left[\mathsf{observe}(e)\right] &= \delta \overline{\times} \left[e\right]\_{\mathbb{R}} \geqslant \geqslant \lambda(\sigma, b). \begin{cases} \delta(\sigma) & b \neq 0 \\ \delta(\not\bot) & b = 0 \end{cases} \end{aligned}$$

Fig. 8. The semantics of programs in our probabilistic language. Here, <sup>σ</sup>[<sup>x</sup> <sup>→</sup> <sup>v</sup>] results in σ with the value stored under x updated to v. σ (Γ) selects only those variables from σ that occur in Γ, meaning {x*<sup>i</sup>* → v*i*}*<sup>i</sup>*∈I ({x*<sup>i</sup>* : τ*i*}*<sup>i</sup>*∈I- ) = {x*<sup>i</sup>* → v*i*}*<sup>i</sup>*∈I∩I-.

*Statements.* Fig. 8 assigns to each statement P with Γ <sup>P</sup> Γ a probability kernel [[P]]: [[Γ]] → [[Γ ]]. Note the use of × in δ×[[e]], which allows evaluating e while keeping the state σ in which e is being evaluated. Intuitively, if evaluating e results in an exception from X , the previous state σ is irrelevant, and the result of δ×[[e]] will be that exception from X .

*While Loop.* To define the semantics of the while loop **while** <sup>e</sup> {P}, we introduce <sup>a</sup> *kernel transformer* [[**while** <sup>e</sup> {P}]]trans : ([[Γ]] → [[Γ]]) <sup>→</sup> ([[Γ]] → [[Γ]]) that transforms the semantics for n runs of the loop to the semantics for n + 1 runs of the loop. Concretely,

$$\{\mathsf{while}\ \mathsf{e}\ \{P\}\}^{\mathsf{trans}}(\kappa) = \delta \overline{\times} [e] \succ \Rightarrow \lambda(\sigma, b). \begin{cases} \{P\}(\sigma) \ggg \kappa \, b \neq 0\\ \delta(\sigma) \qquad b = 0 \end{cases}$$

This semantics first evaluates e, while keeping the program state around using δ. If e evaluates to 0, the while loop terminates and we return the current program state σ. If e does not evaluate to 0, we run the loop body P and feed the result to the next iteration of the loop, using κ.

We can then define the semantics of **while** <sup>e</sup> {P} using a special fixed point operator fix: ((A → A) → (A → A)) → (A → A), defined by the pointwise limit fix(Δ) = limn→∞ <sup>Δ</sup>n(-), where -:= λσ. δ(-) and Δ<sup>n</sup> denotes the n-fold composition of Δ. Δn(-) puts all runs of the while loop that do not terminate within n steps into the state -. In the limit, only has weight on those runs of the loop that never terminate. fix(Δ) is only defined if its pointwise limit exists. Making use of fix, we can define the semantics of the while loop as follows:

$$\left[\mathsf{while}\,\mathsf{i}\,\mathsf{i}\,\mathsf{i}\,\left\{P\right\}\right] = \mathsf{fix}\left(\left[\mathsf{while}\,\mathsf{i}\,\mathsf{i}\,\mathsf{i}\,\left\{P\right\}\right]^{\mathsf{trans}}\right),$$

Lemma 7. *For* Δ *as in the semantics of the while loop, and for each* σ *and each* <sup>S</sup>*, the limit* limn→∞ <sup>Δ</sup>n(-)(σ)(S) *exists.*

Lemma 7 holds because increasing n may only shift probability mass from to other states (we provide a formal proof in Appendix B). Kozen shows a different way of defining the semantics of the while loop [23], using least fixed points. Lemma 8 describes the relation of the semantics of our while loop to the semantics of the while loop of [23]. For more details on the formal interpretation of Lemma 8 and for its proof, see Appendix B.

Lemma 8. *In the absence of exception states, and using sub-probability kernels instead of distribution transformers, the definition of the semantics of the while loop from [23] is equivalent to ours.*

Theorem 2. *The semantics of each expression* [[e]] *and statement* [[P]] *is indeed a probability kernel.*

*Proof.* The proof proceeds by induction. Some lemmas that are crucial for the proof are listed in Appendix C. Conveniently, most functions that come up in our definition are continuous (like a+b) or continuous except on some countable subset (like <sup>a</sup> b ) and thus measurable.

### 4.4 Recursion

To extend our language with recursion, we apply the same ideas as for the while loop. Given the source code of a function F that uses recursion, we define its

$$\delta \overline{\times} \left[ ! \mathsf{tfu} \mathsf{i} \mathfrak{p} \left( \frac{1}{2} \right) \right] \rangle \rightleftharpoons \geqslant \lambda (\sigma, b) . \left\{ \begin{pmatrix} \kappa \overline{\times} [1] \\ \end{pmatrix} \rhd \simeq \lambda (x, y) . \delta \left( x + y \right) \right\} (\sigma) \quad b \neq 0$$

Fig. 9. Kernel transformer [[geom]]trans(κ) for geom given in Listing 11.

semantics in terms of a kernel transformer [[F]]trans. This kernel transformer takes semantics for F up to a recursion depth of n and returns semantics for F up to recursion depth n+ 1. Formally, [[F]]trans(κ) follows the usual semantics, but uses κ as the semantics for recursive calls to F (we will provide an example shortly). Finally, we define the semantics of F by [[F]] := fix [[F]]trans . Just as for the while loop, fix [[F]]trans is well-defined because stepping from recursion depth n to n+ 1 can only shift probability mass from to other states. We note that we could generalize our approach to mutual recursion.

To demonstrate how we define the kernel transformer, consider the recursive implementation of the geometric distribution in Listing 11 (to simplify presentation, Listing 11 uses early return). Given semantics <sup>κ</sup> for geom : **<sup>1</sup>** → **R** up to recursion depth n, we can define the semantics of geom up to recursion depth <sup>n</sup> + 1, as illustrated in Fig. 9.


### 4.5 Higher-Order Functions

Our language cannot express higher-order functions. When trying to give semantics to higher-order probabilistic programs, an important step is to define a σalgebra on the set of functions from real numbers to real numbers. Unfortunately, no matter which σ-algebra is picked, function evaluation (i.e. the function that takes f and x as arguments and returns f(x)) is not measurable [1]. This is a known limitation that previous work has looked into (e.g. [35] address it by restricting the set of functions to those expressible by their source code).

A promising recent approach is replacing measurable spaces by quasi-Borel spaces [16]. This allows expressing higher-order functions, at the price of replacing the well-known and well-understood measurable spaces by a new concept.

#### 4.6 Non-determinism

To extend our language with non-determinism, we may define the semantics of expressions, functions and statements in terms of sets of kernels. For an expression <sup>e</sup> typed by <sup>Γ</sup> <sup>e</sup> : <sup>τ</sup> , this means that [[e]]τ ∈ P ([[Γ]] → [[<sup>τ</sup> ]]), where <sup>P</sup> (S) denotes the power set of S. Lifting our semantics to non-determinism is mostly straightforward, except for loops. There, [[**while** <sup>e</sup> {P}]] contains all kernels of the form limn→∞(Δ<sup>1</sup> ◦··· ◦ <sup>Δ</sup>n)(-), where <sup>Δ</sup>i <sup>∈</sup> [[**while** <sup>e</sup> {P}]]trans. Previous work has studied non-determinism in more detail, see e.g. [21,22].

### 5 Properties of Semantics

We now investigate two properties of our semantics: commutativity and associativity. These are useful in practice, e.g. because they enable rewriting programs to a form that allows for more efficient inference [5].

In this section, we write e<sup>1</sup> e<sup>2</sup> when expressions e<sup>1</sup> and e<sup>2</sup> are equivalent (i.e. when [[e1]] = [[e2]]). Analogously, we write P<sup>1</sup> P<sup>2</sup> for [[P1]] = [[P2]].

### 5.1 Commutativity

In the presence of exception states, our language cannot guarantee commutativity of expressions such as e<sup>1</sup> + e2. This is not surprising, as in our semantics the first exception bypasses all later exceptions.

Lemma 9. *For function* <sup>F</sup>(){**while** <sup>1</sup> {**skip**}; **return** <sup>0</sup>}*,*

$$\frac{1}{0} + F() \not\equiv F() + \frac{1}{0}$$

Formally, this is because if we evaluate <sup>1</sup> <sup>0</sup> first, we only have weight on ⊥. If instead, we evaluate F() first, we only have weight on -, by an analogous calculation. A more detailed proof is included in Appendix D.

However, the only reason for non-commutativity is the presence of exceptions. Assuming that e<sup>1</sup> and e<sup>2</sup> cannot produce exceptions, we obtain commutativity:

Lemma 10. *If* [[e1]](σ)(X ) = [[e2]](σ)(X )=0 *for all* σ*, then* e<sup>1</sup> ⊕ e<sup>2</sup> e<sup>2</sup> ⊕ e1*, for any commutative operator* ⊕*.*

The proof of Lemma 10 (provided in Appendix D) relies on the absence of exceptions and Fubini's Theorem. This commutativity result is in line with the results from [34], which proves commutativity in the absence of exceptions.

In the analogous situation for statements, we cannot assume commutativity P1; P<sup>2</sup> P2; P1, even if there is no dataflow from P<sup>1</sup> to P2. We already illustrated this in Listing 10, where swapping two lines changes the program semantics. However, in the absence of exceptions and dataflow from P<sup>1</sup> to P2, we can guarantee P1; P<sup>2</sup> P2; P1.

#### 5.2 Associativity

A careful reader might suspect that since commutativity does not always hold in the presence of exceptions, a similar situation might arise for associativity of some expressions. As an example, can we guarantee e1+(e2+e3) (e1+e2)+e3, even in the presence of exceptions? The answer is yes, intuitively because exceptions can only change the behavior of a program if the order of their occurrence is changed. This is not the case for associativity. Formally, we derive the following:

Lemma 11. e<sup>1</sup> ⊕ (e<sup>2</sup> ⊕ e3) (e<sup>1</sup> ⊕ e2) ⊕ e3*, for any associative operator* ⊕*.*

We include notes on the proof of Lemma 11 in Appendix D, mainly relying on the associativity of × (Lemma 6). Likewise, sequential composition is associative: (P1; P2); P<sup>3</sup> P1; (P2; P3). This is due to the associativity of >=> (Lemma 5).

### 5.3 Adding the **score** Primitive

Some languages include the primitive **score**, which allows to increase or decrease the probability of a certain event (or trace) [34,35].

Listing <sup>12</sup> shows an example program using **score**. Without normalization, it returns 0 with probability <sup>1</sup> 2 and 1 with "probability" <sup>1</sup> <sup>2</sup> · 2=1. After normalization, it returns 0 with probability <sup>1</sup> <sup>3</sup> and <sup>1</sup> with probability <sup>2</sup> 3 . Because **score** allows decreasing the probability of a specific event, it renders **observe** unnecessary. In general, we can replace **observe**(e) by **score**(<sup>e</sup> = 0). However, performing this replacement means losing the explicit knowledge of the weight on -

**score** can be useful to modify the shape of a given distribution. For example, Listing 13 turns the distribution of x, which is a Gaussian distribution, into the Lebesgue measure λ, by multiplying the density of x by its inverse. Hence, the density of x at any location is 1. Note that the distribution over x cannot be described by a probability measure, because e.g. the "probability" that x lies in the interval [0, 2] is 2.

Unfortunately, termination in the presence of **score** is not well-defined, as illustrated in Listing 14. In this program, the only non-terminating trace keeps changing its weight, switching between 1 and 2. In the limit, it is impossible to determine the weight of non-termination.

Hence, allowing the use of the **score** primitive only makes sense after abolishing the tracking of nontermination (-), which can be achieved by only measuring sets that do not contain non-termination. Formally, this means restricting the semantics of expressions <sup>e</sup> typed by <sup>Γ</sup> <sup>e</sup> : <sup>τ</sup> to [[e]]τ : <sup>Γ</sup> → [[τ ]] − {-} . x:=**flip**( <sup>1</sup> <sup>2</sup> ); **if** x=1 { **score**(2); } **return** x; Listing 12. Using **score**

. x:=**gauss**(0,1); **score**( <sup>√</sup>2πe*<sup>x</sup>*2*/*2); **return** x;

Listing 13. Reshaping a distribution.

i:=0; **while** 1 {

```
if i=0 {
    score(2);
}else{
    score( 1
            2 );
}
i=1-i;
```
} Listing 14. **score** vs non-termination

Intuitively, abolishing non-termination means that we ignore non-terminating runs (these result in weight on non-termination). After doing this, we can give well-defined semantics to the **score** primitive.

The typing rule and semantics of **score** are:

$$\frac{\Gamma \vdash e : \mathbb{R}}{\Gamma \stackrel{\mathsf{sear}(e)}{\mathsf{succ}} \Gamma} \qquad \text{and} \quad \mathsf{[scorre(e)]} = \delta \overline{\times} [e]\_{\mathbb{R}} \iff \lambda(\sigma, c).c \* \delta(\sigma)$$

After including **score** into our language, the semantics of the language can no longer be expressed in terms of probability kernels as stated in Theorem 2, because the probability of any event can be inflated beyond 1. Instead, the semantics must be expressed in terms of s-finite kernels.

Theorem 3. *After adding the score primitive and abolishing non-termination, the semantics of each expression* [[e]] *and statement* [[P]] *is an s-finite kernel.*


Table 3. Comparison of existing semantics to ours. When adding **score** to our language (Sect. 5.3), our semantics use s-finite kernels (not probability kernels).

*Proof.* As for Theorem 2, the proof proceeds by induction. Most parts of the proof are analogous (e.g. >=> preserves s-finite kernels instead of probability kernels). For while loops, the limit still exists (Lemma 7 still holds), but it is not bounded from above anymore. The limit indeed corresponds to an s-finite kernel because the limit of strictly increasing s-finite kernels is an s-finite kernel.

**score**(2); **assert** In the presence of **score**, we can still talk about the interaction of different exceptions, assuming that we do track different types of exceptions (e.g. division by zero and out of bounds access of arrays). Then, we keep the commutativity and associativity properties studied in the previous sections, because these still hold for s-finite kernels.

Listing <sup>15</sup> shows an interaction of **score** with **assert**. As one would expect, our semantics will assign weight 2 to ⊥ in this case. If the two statements are switched, our semantics will ignore **score**(2) and assign weight <sup>1</sup> to <sup>⊥</sup>. Hence again, commutativity does not hold.

```
assert(false);
Listing 15. Inter-
```
action of **score** and

```
while 1 {
    score(2);
    assert(flip( 1
                  2 ));
}
Listing 16. Interaction of
score, assert and loops
```
Listing 16 shows a program that keeps increasing the probability of an error. In every loop iteration, there is a "probability" of 1 of running into an error. Overall, Listing 16 results in weight ∞ on state ⊥.

### 6 Related Work

Kozen provides classic semantics to probabilistic programs [23]. We follow his main ideas, but deviate in some aspects in order to introduce additional features or to make our presentation cleaner. The semantics by Hur et al. [19] is heavily based on [23], so we do not go into more detail here. Table 3 summarizes the comparison of our approach to that of others.

*Kernels.* Like our work, most modern approaches use kernels (i.e., functions from values to distributions) to provide semantics to probabilistic programs [4,24,33, 34]. Borgström et al. [4] use sub-probability kernels on (symbolic) expressions. Staton [34] uses s-finite kernels to capture the semantics of the **score** primitive (when we discuss **score** in Sect. 5.3, we do the same).

In the classic semantics of [23], Kozen uses distribution transformers (i.e., functions from distributions to distributions). For later work [24], Kozen also switches to sub-probability kernels, which has the advantage of avoiding redundancies. A different approach uses weakest precondition to define the semantics, as in [28]. Staton et al. [35] use a different concept of measurable functions A → P(**R**≥<sup>0</sup> ×B) (where P(S) denotes the set of all probability measures on S).

*Typing.* Some probabilistic languages are untyped [4,28], while others are limited to just a single type: **R**<sup>n</sup> [23,24] or <sup>∞</sup> i=1 **<sup>N</sup>**<sup>i</sup> <sup>∪</sup> **<sup>N</sup>**<sup>∞</sup> [33]. Some languages provide more interesting types including sum types, distribution types and tuples [34,35]. We allow tuples and array types, and we could easily account for sum types.

*Loops.* Because the semantics of while loops is not always straightforward, some languages avoid while loops and recursion altogether [35]. Borgström et al. handle recursion instead of while loops, defining the semantics in terms of a fixed point [4]. Many languages handle while loops by least fixed points [23,24,28,33]. Staton defines while loops in terms of the counting measure [34], which is similar to defining them by a fixed point. We define the semantics of while loops in terms of a fixed point, which avoids the need to prove the least fixed point exists (still, the classic while loop semantics of [23] and our formulation are equivalent).

Most languages do not explicitly track non-termination, but lose probability weight by non-termination [4,23,24,34]. This missing weight can be used to identify the probability of non-termination, but only if other exceptions (such as **fail** in [24] or observation failure in [4]) do not also result in missing weight. The semantics of [33] are tailored to applications in networks and lose *non-terminating packet histories* instead of weight (due to a particular least fixed point construction of Scott-continuous maps on algebraic and continuous directed complete partial orders). Some works define non-termination as missing weight in the weakest precondition [28]. Specifically, the semantics in [28] can also explicitly express probability of non-termination *or* ending up in some state (using the separate construct of a weakest liberal precondition). We model nontermination by an explicit state -, which has the advantage that in the context of lost weight, we know what part of that lost weight is due to non-termination.

Kaminski et al. [21] investigate the run-time of probabilistic program with loops and **fail** (interpreted as early termination), but without observations. In [21], non-termination corresponds to an infinite run-time.

*Error States.* Many languages do not consider partial functions (like fractions a b ) and thus never run into an exception state [23,24,33]. Olmedo et al. [28] do not consider partial functions, but support the related concept of an explicit **abort**. The semantics of **abort** relies on missing weight in the final distribution. Some languages handle expressions whose evaluation may fail using sum types [34,35], forcing the programmer to deal with errors explicitly (we discuss the disadvantages of this approach at Listing 6). Formally, a sum type A + B is a disjoint union of the two sets A and B. Defining the semantics of an expression in terms of the sum type A+{⊥} allows that expression to evaluate to *either* a value <sup>a</sup> <sup>∈</sup> <sup>A</sup> *or* to <sup>⊥</sup>. Borgström et al. [4] have a single state **fail** expressing exceptions such as dynamically detected type errors (without forcing the programmer to deal with exceptions explicitly). Our semantics also uses sum types to handle exceptions, but the handling is implicit, by defining semantics in terms of (>=>) (which defines how exceptions propagate in a program) instead of (;).

*Constraints.* To enforce hard constraints, we use the **observe**(e) statement, which puts the program into a special failure state if it does not satisfy e. We can encode soft constraints by **observe**(e), where <sup>e</sup> is probabilistic (this is a general technique). Borgström et al. [4] allow both soft constraints that reduce the probability of some program traces and hard constraints whose failure leads to the error state **fail**. Some languages can handle generalized soft constraints: they can not only decrease the probability of certain traces using soft constraints, but also increase them, using **score**(x) [34,35]. We investigate the consequences of adding **score** to our language in Sect. 5.3. Kozen [24] handles hard (and hence soft) constraints using **fail** (which results in a sub-probability distribution). Some languages can handle neither hard nor soft constraints [23,33]. Note though that the semantics of ProbNetKAT in [33] can drop certain packages, which is a similar behavior. Olmedo et al. [28] handle hard (and hence soft) constraints by a conditional weakest precondition that tracks both the probability of not failing any observation and the probability of ending in specific states. Unfortunately, this work is restricted to discrete distributions and is specifically designed to handle observation failures and non-termination. Thus, it is not obvious how to adapt the semantics if a different kind of exception is to be added.

*Interaction of Different Exceptions.* Most existing work handles at least some exceptions by sub-probability distributions [4,23,24,33,34]. Then, any missing weight in the final distribution must be due to exceptions. However, this leads to a conflation of all exceptions handled by sub-probability distributions (for the consequences of this, see, e.g., our discussion of Listing 8). Note that semantics based on sub-probability kernels can add more exceptions, but they will simply be conflated with all other exceptions.

Some previous work does not (exclusively) rely on sub-probability distributions. Borgström et al. [4] handle errors implicitly, but still use sub-probability kernels to handle non-termination and **score**. Olmedo et al. can distinguish nontermination (which is conflated with exception failure) from failing observations by introducing two separate semantic primitives (conditional weakest precondition and conditional liberal weakest precondition) [28]. Because their solution specifically addresses non-termination, it is non-trivial to generalize this treatment to more than two exception states. By using sum types, some semantics avoid interactions of errors with non-termination or constraint failures, but still cannot distinguish the latter [34,35]. Note that semantics based on sum types can easily add more exceptions (although it is impossible to add non-termination). However, the interaction of different exceptions cannot be observed, because the programmer has to handle exceptions explicitly.

To the best of our knowledge, we are the first to give formal semantics to programs that may produce exceptions in this generality. One work investigates assertions in probabilistic programs, but explicitly disallows non-terminating loops [32]. Moreover, the semantics in [32] are operational, leaving the distribution (in terms of measure theory) of program outputs unclear. Cho et al. [8] investigate the interaction of partial programs and observe, but are restricted to discrete distributions and to only two exception states. In addition, this investigation treats these two exception states differently, making it non-trivial to extend the results to three or more exception states. Katoen et al. [22] investigate the intuitive problems when combining non-termination and observations, but restrict their discussions to discrete distributions and do not provide formal semantics. Huang [17] treats partial functions, but not different kinds of exceptions. In general, we know of no probabilistic programming language that distinguishes more than two different kinds of exceptions. Distinguishing two kinds of exceptions is simpler than three, because it is possible to handle one exception as an explicit exception state and the other one by missing weight (as e.g. in [4]).

Cousot and Monerau [9] provide a trace semantics that captures probabilistic behavior by an explicit randomness source given to the program as an argument. This allows handling non-termination by non-terminating traces. While the work does not discuss errors or observation failure, it is possible to add both. However, using an explicit randomness source has other disadvantages, already discussed by Kozen [23]. Most notably, this approach requires a distribution over the randomness source and a translation from the randomness source to random choices in the program, even though we only care about the distribution of the latter.

### 7 Conclusion

In this work we presented an expressive probabilistic programming language that supports important features such as mixing continuous and discrete distributions, arrays, observations, partial functions and while-loops. Unlike prior work, our semantics distinguishes non-termination, observation failures and error states. This allows us to investigate the subtle interaction of different exceptions, which is not possible for semantics that conflate different kinds of exceptions. Our investigation confirms the intuitive understanding of the interaction of exceptions presented in Sect. 2. However, it also shows that some desirable properties, like commutativity, only hold in the absence of exceptions. This situation is unavoidable, and largely analogous to the situation in deterministic languages.

Even though our semantics only distinguish three exception states, it can be trivially extended to handle any countable set of exception states. This allows for an even finer-grained distinction of e.g. division by zero, out of bounds array accesses or casting failures (in a language that allows type casting). Our semantics also allows enriching exceptions with the line number that the exception originated from (of course, this is not possible for non-termination). For an uncountable set of exception states, an extension is possible but not trivial.

### A Proofs for Preliminaries

In this section, we provide lemmas, proofs and some definitions that were left out or cut short in Sect. 3. For a more detailed introduction into measure theory, we recommend the book *A crash course on the Lebesgue integral and measure theory* [7].

### A.1 Measures

Definition 3. *Let* (A, ΣA) *be a measurable space and* <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] *a measure on* A*.*


Note that for a <sup>σ</sup>-finite measure <sup>μ</sup>, <sup>μ</sup>(A) = <sup>∞</sup> is possible, even though <sup>μ</sup>(Ai) <sup>&</sup>lt; <sup>∞</sup> for all i. As an example, the Lebesgue measure is σ-finite because **R** = i∈**N**[−i, i] with λ([−i, i]) = 2 ∗ i, but λ(**R**) = ∞.

Lemma 12. *The following definition of s-finite measures is equivalent to our definition of s-finite measures (the difference is that the* <sup>μ</sup>i*s are only required to be finite):*


*Proof.* Since any sub-probability measure is finite, one direction is trivial. For the other direction, let μ = - i∈**<sup>N</sup>** <sup>μ</sup> i for finite measures <sup>μ</sup> i. Obviously, <sup>μ</sup> <sup>≥</sup> <sup>0</sup>, μ(∅)=0 and μ( i∈**<sup>N</sup>** <sup>A</sup>i) = - i∈**<sup>N</sup>** <sup>A</sup><sup>i</sup> for mutually disjoint <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup>A, so <sup>μ</sup> is a measure. To show that μ can be written as a sum of sub-probability measures, let <sup>n</sup>i := μ i(A). Then, <sup>μ</sup> <sup>=</sup> - i∈**<sup>N</sup>** <sup>μ</sup> i <sup>=</sup> - i∈**<sup>N</sup>** ni ni μ i <sup>=</sup> - i∈**<sup>N</sup>** - j∈[ni] 1 ni μ i. We let <sup>μ</sup>i := <sup>1</sup> ni μ i <sup>≤</sup> <sup>1</sup>.

Lemma 13. *Any* <sup>σ</sup>*-finite measure* <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] *is s-finite.*

*Proof.* Since μ is σ-finite, A = i∈**<sup>N</sup>** <sup>A</sup><sup>i</sup> with <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> and <sup>μ</sup>(Ai) <sup>&</sup>lt; <sup>∞</sup>. Without loss of generality, assume that the - <sup>A</sup>i form a partition of <sup>A</sup>. Then, <sup>μ</sup>(S) = i∈**<sup>N</sup>** <sup>μ</sup>(<sup>S</sup> <sup>∩</sup> <sup>A</sup>i), with <sup>μ</sup>(· ∩ <sup>A</sup>i) <sup>&</sup>lt; <sup>∞</sup>. Thus, <sup>μ</sup> is a countable sum of finite measures.

Definition 4. *The counting measure* c : B → [0,∞] *is defined by*

$$c(S) = \begin{cases} |S| & S \ finite \\ \infty & otherwise \end{cases}$$

Definition 5. *The infinity measure* μ: B → [0,∞] *is defined by*

$$\mu(S) = \begin{cases} 0 & S = \emptyset \\ \infty & otherwise \end{cases}$$

Lemma 14. *Neither the counting measure nor the infinity measure are s-finite.*

*Proof.* For the counting measure c, assume (toward a contradiction) c = - i∈**<sup>N</sup>** <sup>c</sup>i. We have **R** = {r ∈ **R** | c({r}) > 0} = <sup>i</sup>∈**<sup>N</sup>**{<sup>r</sup> <sup>∈</sup> **<sup>R</sup>** <sup>|</sup> <sup>c</sup>i({r}) <sup>&</sup>gt; <sup>0</sup>} <sup>=</sup> i∈**<sup>N</sup>** n∈**<sup>N</sup>**{<sup>r</sup> <sup>∈</sup> **<sup>R</sup>** <sup>|</sup> <sup>c</sup>i({r}) <sup>&</sup>gt; <sup>1</sup> n }. Because **<sup>R</sup>** is uncountable, there must be i, n <sup>∈</sup> **<sup>N</sup>** for which <sup>S</sup> := {<sup>r</sup> <sup>∈</sup> **<sup>R</sup>** <sup>|</sup> <sup>c</sup>i({r}) <sup>&</sup>gt; <sup>1</sup> n } is uncountable. Thus for any measurable, countably infinite <sup>S</sup> <sup>⊆</sup> <sup>S</sup>, <sup>c</sup>i(S ) = <sup>∞</sup>, which means that <sup>c</sup>i is not finite. Proceed analogously for the infinity measure.

Lemma 15. *The measure* <sup>μ</sup> : B → [0,∞] *with* <sup>μ</sup>(S) = <sup>0</sup> <sup>λ</sup>(S)=0 ∞ λ(S) > 0 *is sfinite but not* σ*-finite.*

*Proof.* μ = - i∈**<sup>N</sup>** <sup>λ</sup>, and <sup>λ</sup> is s-finite, so <sup>μ</sup> is s-finite. Assume (toward a contradiction) that μ is σ-finite. Then **R** = i∈**<sup>N</sup>** <sup>A</sup><sup>i</sup> with <sup>A</sup><sup>i</sup> ∈ B and <sup>μ</sup>(Ai) <sup>&</sup>lt; <sup>∞</sup>. Thus, <sup>μ</sup>(Ai)=0 and hence <sup>μ</sup>(**R**) = <sup>μ</sup>( i∈**<sup>N</sup>** <sup>A</sup>i) <sup>≤</sup> - i∈**<sup>N</sup>** <sup>μ</sup>(Ai)=0, a contradiction.

Lemma 16

$$\begin{aligned} \forall S \in \Sigma\_{A \times B} \colon (\mu \times \mu')(S) &= \int\_{a \in A} \mu'(\{b \in B \mid (a, b) \in S\}) \mu(da) \\ &= \int\_{b \in B} \mu(\{a \in A \mid (a, b) \in S\}) \mu'(db) \\\\ \forall S \in \Sigma\_{\overline{A \times B}} \colon (\mu \overline{\times} \mu')(S) &= \int\_{a \in \overline{A}} \mu'(\{b \in \overline{B} \mid \overline{(a, b)} \in S\}) \mu(da) \end{aligned}$$

$$\begin{aligned} \forall S \in \Sigma\_{\overline{A \times B}} \colon (\mu \overline{\times} \mu')(S) &= \int\_{a \in \overline{A}} \mu'(\{b \in B \mid (a, b) \in S\}) \mu(da) \\ &= \int\_{b \in \overline{B}} \mu(\{a \in \overline{A} \mid \overline{(a, b)} \in S\}) \mu'(db) \end{aligned}$$

*Proof*

$$\begin{aligned} (\mu \times \mu')(S) &= \int \int\_{a \in A} \int\_{b \in B} [(a, b) \in S] \mu'(db) \mu(da) \\ &= \int \int\_{a \in A} \int \left[ b \in \{ b' \in B \mid (a, b') \in S \} \right] \mu'(db) \mu(da) \\ &= \int\_{a \in A} \mu'(\{ b' \in B \mid (a, b') \in S \}) \mu(da) \end{aligned}$$

172 B. Bichsel et al.

$$\begin{aligned} (\mu \times \mu')(S) &= \int\limits\_{a \in A} \int\limits\_{b \in B} [(a, b) \in S] \mu'(db) \mu(da) \\ &= \int\limits\_{b \in B} \int\limits\_{a \in A} [(a, b) \in S] \mu(da) \mu'(db) \\ &= \dots \\ &= \int\limits\_{b \in B} \mu(\{a' \in A \mid (a', b) \in S\}) \mu'(db) \end{aligned} \text{Fubini}$$

In the second line, we have used that (a, b) ∈ S ⇐⇒ b ∈ {b ∈ B | (a, b ) ∈ S}. The proof works analogously for ×.

Lemma 17. *Let* δ : A → A*,* κ: A → B*. Then,*

$$(\delta \overline{\times} \kappa)(a)(S) = \kappa(a)(\{b \in \overline{B} \mid \overline{(a,b)} \in S\})$$

*Proof*

$$\begin{aligned} (\delta \overline{\times} \kappa)(a)(S) &= \int\_{b \in \overline{B}} \delta(a)(\{a' \in A \mid \overline{(a',b)} \in S\}) \kappa(a)(db) & \text{Lemma 16} \\ &= \int\_{b \in \overline{B}} \overline{[(a,b) \in S]} \kappa(a)(db) \\ &= \kappa(a)(\{b \in \overline{B} \mid \overline{(a,b)} \in S\}) \end{aligned}$$

Lemma 1. *For measures* <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞]*,* <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞]*, let* <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A *and* <sup>T</sup> <sup>∈</sup> <sup>Σ</sup>B*. Then,* (<sup>μ</sup> <sup>×</sup> <sup>μ</sup> )(S × T) = μ(S) · μ (T)*.*

*Proof*

$$\begin{aligned} (\mu \times \mu')(S \times T) &= \int\_{a \in A} \mu'(\{b \in B \mid (a, b) \in S \times T\}) \mu(da) & \text{Lemma 16} \\ &= \int\_{a \in A} \mu' \left( \left\{ \begin{aligned} T \ a \in S \\ \emptyset \text{ otherwise} \end{aligned} \right\} \right) \mu(da) \\ &= \int\_{a \in S} \mu'(T) \mu(da) \\ &= \mu(S) \* \mu'(T) \end{aligned}$$

Lemma 2. × *and* × *for s-finite measures are associative, left- and right-distributive and preserve (sub-)probability and s-finite measures.*

*Proof.* Remember that (μ × μ )(S) = a∈A b∈B[(a, b) <sup>∈</sup> <sup>S</sup>]μ (db)μ(da) and that (μ×μ )(S) = a∈A b∈B[(a, b) <sup>∈</sup> <sup>S</sup>]μ (db)μ(da). Preservation of (sub-)probability measures is trivial. Distributivity and preservation of s-finite measures are easily established by properties of the Lebesgue integral in Lemma 19.

For associativity, let <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞], <sup>μ</sup>: <sup>Σ</sup>B <sup>→</sup> [0,∞] and <sup>μ</sup>: <sup>Σ</sup>C <sup>→</sup> [0,∞].

$$\begin{split} & (\mu \vee \mu') \times \mu')(S) \\ &= \int\_{c \in C} (\mu \times \mu')(\{t \in A \times B \mid (t, c) \in S\}) \mu''(dc) \qquad \text{Lemma 16} \\ &= \int\_{c \in C} \int\_{a \in A} \int\_{b \in B} [(a, b) \in \{t \in A \times B \mid (t, c) \in S\}] \mu'(db) \mu(da) \mu''(dc) \\ &= \int\_{c \in C} \int\_{a \in A} \int\_{b \in B} [(a, b, c) \in S] \mu'(db) \mu(da) \mu''(dc) \\ &= \int\_{a \in A} \int\_{b \in B} \int\_{c \in C} [(a, b, c) \in S] \mu''(dc) \mu'(db) \mu(da) \qquad \text{Fubini} \\ &= \int\_{a \in A} \int\_{b \in B} \int\_{c \in C} [(b, c) \in \{t \in B \times C \mid (a, t) \in S\}] \mu''(dc) \mu'(db) \mu(da) \\ &= \int\_{a \in A} (\mu' \times \mu'')(\{t \in B \times C \mid (a, t) \in S\}) \mu(da) \\ &= (\mu \times (\mu' \times \mu''))(S) \mu(da) \qquad \text{Lemma 16} \end{split}$$

The proof proceeds analogously for ×.

Lemma 18. *Let* (A, ΣA) *and* (B,ΣB) *be measurable spaces. Consider measures* μ, μ1, μ<sup>2</sup> : <sup>Σ</sup>A <sup>→</sup> [0,∞] *and* ν, ν1, ν<sup>2</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞]*. We assume that* <sup>ν</sup><sup>1</sup> <sup>≤</sup> <sup>ν</sup><sup>2</sup> *and* μ<sup>1</sup> ≤ μ<sup>2</sup> *hold pointwise. Then,*

$$
\begin{aligned}
\mu \overline{\times} \nu\_1 &\leq \mu \overline{\times} \nu\_2 \\
\mu\_1 \overline{\times} \nu &\leq \mu\_2 \overline{\times} \nu
\end{aligned}
$$

*Proof.* Let <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>A×B and <sup>ν</sup><sup>1</sup> <sup>≤</sup> <sup>ν</sup>2. Then, we have

$$\begin{aligned} \nu\_1 &\le \nu\_2\\ \Longrightarrow \underbrace{\int\_{b\in\overline{B}} [\overline{(a,b)} \in S] \nu\_1(db)}\_{=:f(a)} &\le \underbrace{\int\_{b\in\overline{B}} [\overline{(a,b)} \in S] \nu\_2(db)}\_{=:g(a)} &\le \text{Lemma 19.}\\ \Longrightarrow \int\_{a\in\overline{A}} f(a)\mu(da) &\le \int\_{a\in\overline{A}} g(a)\mu(da) &\text{Lemma 19.}\\ \Longrightarrow (\mu\overline{\times}\nu\_1)(S) &\le (\mu\overline{\times}\nu\_2)(S) \end{aligned}$$

The proof for μ1×ν ≤ μ2×ν is similar.

#### A.2 Lebesgue Integral

Lemma 19. *Let* (A, ΣA) *and* (B,ΣB) *be measurable spaces,* <sup>E</sup> <sup>∈</sup> <sup>Σ</sup>A *and* <sup>E</sup> <sup>∈</sup> <sup>Σ</sup>B *measurable sets,* f,fi, g : <sup>A</sup> <sup>→</sup> **<sup>R</sup>** *and* <sup>h</sup>: <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> **<sup>R</sup>** *measurable functions,*

$$\mu, \mu\_i, \nu \colon \Sigma\_A \to [0, \infty] \text{ and } \mu' \colon \Sigma\_B \to [0, \infty] \text{ measures.}$$

$$\int\_{a \in E} f(a) \mu(da) \in [0, \infty]$$

$$0 \le f \le g \le \infty \implies \int\_{a \in E} f(a) \mu(da) \le \int\_{a \in E} g(a) \mu(da)$$

$$\mu \le \nu \implies \int\_{a \in E} f(a) \mu(da) \le \int\_{a \in E} f(a) \nu(da)$$

$$\sum\_{n=1}^{\infty} \int\_{a \in E} f\_n(a) \mu(da) = \int\_{a \in E} \sum\_{n=1}^{\infty} f\_n(a) \mu(da)$$

$$\int\_{a \in E} \int f(a, b) \mu'(db) \mu(da) = \int\_{b \in E^c} \int f(a, b) \mu'(da) \mu(db) \qquad \qquad \mu, \mu' \sigma \text{-finite}$$

$$\int\_{a \in E} f(a) \left(\sum\_{n=1}^{\infty} \mu\_i\right)(da) = \sum\_{n=1}^{\infty} \int\_{a \in E} f(a) \mu\_i(da)$$

$$\int\_{a \in E} f(a) \delta(x) (da) = f(x)$$

$$\int\_{a \in E} f(a) \delta(x) (da) = f(x)$$

*Finally, if* f<sup>1</sup> ≤ f<sup>2</sup> ≤···≤∞*, we have*

$$\lim\_{n \to \infty} \int\_{a \in E} f\_n(a)\mu(da) = \int\_{a \in E} \lim\_{n \to \infty} f\_n(a)\mu(da)$$

*Proof.* The following properties can be proven for simple functions and limits of simple functions (this suffices):

$$\int\_{a \in E} f(a) \left( \sum\_{n=1}^{\infty} \mu\_i \right) (da) = \sum\_{n=1}^{\infty} \int\_{a \in E} f(a) \mu\_i(da)$$

$$\mu \le \nu \implies \int\_{a \in E} f(a) \mu(da) \le \int\_{a \in E} f(a) \nu(da)$$

$$\mu \le \mu(\nu) \text{ a constant } \mu \text{ is not equal}$$

 a∈E <sup>f</sup>(a)δ(x)(da) = <sup>f</sup>(x) is straightforward. For the other properties, see [31].

Theorem 1 (Fubini's theorem). *For s-finite measures* <sup>μ</sup>: <sup>Σ</sup>A <sup>→</sup> [0,∞] *and* <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞] *and any measurable function* <sup>f</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> [0,∞]*,*

$$\int\_{a \in A} \int\_{b \in B} f(a, b) \mu'(db) \mu(da) = \int\_{b \in B} \int\_{a \in A} f(a, b) \mu(da) \mu'(db)$$

*For s-finite measures* μ: ΣA <sup>→</sup> [0,∞] *and* <sup>μ</sup> : <sup>Σ</sup>B <sup>→</sup> [0,∞] *and any measurable function* f : A × B → [0,∞]*,*

$$\int\_{a \in \overline{A}} \int\_{b \in \overline{B}} \overline{f}(a,b)\mu'(db)\mu(da) = \int\_{b \in \overline{B}} \int\_{a \in \overline{A}} \overline{f}(a,b)\mu(da)\mu'(db)\mu'(db)$$

*Proof.* Let μ = - i∈**<sup>N</sup>** <sup>μ</sup><sup>i</sup> and <sup>μ</sup> <sup>=</sup> - i∈**<sup>N</sup>** <sup>μ</sup> i for bounded measures <sup>μ</sup><sup>i</sup> and <sup>μ</sup> i.

$$\begin{aligned} &\int\_{a\in A} \int\_{b\in B} f(a,b)\mu'(db)\mu(da) \\ &= \sum\_{i,j\in\mathbb{N}} \int\_{a\in A} \int\_{b\in B} f(a,b)\mu'\_j(db)\mu\_i(da) \quad \text{Lemma 19} \\ &= \sum\_{i,j\in\mathbb{N}} \int\_{b\in B} \int\_{a\in A} f(a,b)\mu\_i(da)\mu'\_j(db) \quad \text{Fubini for} \sigma\text{-finite measures } \mu\_i, \mu'\_j \\ &= \int\_{b\in B} \int\_{a\in A} f(a,b)\mu(da)\mu'(db) \end{aligned}$$

The proof in the presence of exception state is analogous.

Lemma 20. *Fubini does not hold for the counting measure* c : B → [0,∞] *and the Lebesgue measure* λ: B → [0,∞] *(because* c *is not s-finite).*

*Proof*

$$\begin{aligned} \int\_{x \in [0,1]} \int\_{y \in [0,1]} [x = y] c(dy) \lambda(dx) &= \int\_{x \in [0,1]} 1 \lambda(dx) = 1 \\ \int\_{y \in [0,1]} \int\_{x \in [0,1]} [x = y] \lambda(dx) c(dy) &= \int\_{y \in [0,1]} 0 c(dy) = 0 \end{aligned}$$

#### A.3 Kernels

Lemma 21. *Let* κ1, κ <sup>1</sup> : A → B *and* κ2, κ <sup>2</sup> : B → C *be s-finite kernels. If* κ<sup>1</sup> ≤ κ <sup>1</sup> *holds pointwise, then*

$$
\kappa\_1 \gg \gg \kappa\_2 \le \kappa'\_1 \gg \gg \kappa\_2
$$

*If* κ<sup>2</sup> ≤ κ <sup>2</sup> *holds pointwise, then*

$$
\kappa\_1 \gg \gg \kappa\_2 \le \kappa\_1 \gg \dots \gg \kappa\_2'
$$

*Proof.* Assume κ<sup>2</sup> ≤ κ <sup>2</sup>. Thus, κ<sup>2</sup> ≤ κ <sup>2</sup>. Now, let <sup>a</sup> <sup>∈</sup> <sup>A</sup>, <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>C .

$$\begin{aligned} (\kappa\_1 \gg \dots \gg \kappa\_2)(a)(S) &= \int\_{b \in \overline{B}} \overline{\kappa\_2}(b)(S) \, \kappa\_1(a)(db) \\ &\le \int\_{b \in \overline{B}} \overline{\kappa\_2'}(b)(S) \, \kappa\_1(a)(db) \qquad \overline{\kappa\_2} \le \overline{\kappa\_2'}, \text{Lemma 19} \\ &= (\kappa\_1 \gg \dots \gg \kappa\_2')(a)(S) \end{aligned}$$

The proof for κ<sup>1</sup> >=> κ<sup>2</sup> ≤ κ <sup>1</sup> >=> κ<sup>2</sup> works analogously.

Lemma 3. (;) *is associative, left- and right-distributive, has neutral element*<sup>4</sup> δ *and preserves (sub-)probability and s-finite kernels.*

<sup>4</sup> δ is a neutral element of (;) if (δ;κ)=(κ;δ) = κ for all kernels κ.

*Proof.* Remember that (f;g)(a)(S) = b∈B <sup>g</sup>(b)(S) <sup>f</sup>(a)(db). Left- and right-distributivity and the neutral element δ follow from properties of the Lebesgue integral in Lemma 19.

Associativity and preservation of (sub-)probability kernels is well known (see for example [12]). For s-finite kernels f = - i∈**<sup>N</sup>** <sup>f</sup><sup>i</sup> and <sup>g</sup> <sup>=</sup> - i∈**<sup>N</sup>** - <sup>g</sup>i and <sup>h</sup> <sup>=</sup> i∈**<sup>N</sup>** <sup>h</sup>i, we have (for sub-probability kernels <sup>f</sup>i, <sup>g</sup>i, <sup>h</sup>i)

$$\begin{aligned} (f;g);h &= \left(\left(\sum\_{i\in\mathbb{N}} f\_i\right); \left(\sum\_{j\in\mathbb{N}} g\_j\right)\right); \sum\_{k\in\mathbb{N}} h\_k = \sum\_{i,j,k\in\mathbb{N}} (f\_i;g\_j);h\_k\\ &= \sum\_{i,j,k\in\mathbb{N}} f\_i; (g\_j;h\_k) = f;(g;h) \end{aligned}$$

(;) preserves s-finite kernels because for s-finite kernels f and g, we have (for sub-probability kernels <sup>f</sup>i, <sup>g</sup>i) <sup>f</sup>;<sup>g</sup> <sup>=</sup> - i,j∈**<sup>N</sup>** <sup>f</sup>i;gi, a sum of kernels.

Lemma 4. *For* <sup>f</sup> : <sup>A</sup> → <sup>B</sup> *and* <sup>g</sup> : <sup>B</sup> → <sup>C</sup>*,* <sup>a</sup> <sup>∈</sup> <sup>A</sup> *and* <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>C *,*

$$(f \succ \succ g)(a)(S) = (f;g)(a)(S) + \sum\_{x \in \mathcal{X}} \delta(x)(S)f(a)(\{x\})$$

*Proof*

$$\begin{aligned} (f \underset{\succeq \mathcal{B}}{\geqslant} g)(a)(S) &= \int\_{b \in \mathcal{B}} \overline{g}(b)(S) \, f(a)(db) \\ &= \int\_{b \in B} \overline{g}(b)(S) \, f(a)(db) + \int\_{b \in \mathcal{X}} \overline{g}(b)(S) \, f(a)(db) \\ &= \int\_{b \in B} g(b)(S) \, f(a)(db) + \sum\_{b \in \mathcal{X}} \overline{g}(b)(S) \, f(a)(\{x\}) \\ &= (f;g)(a)(S) + \sum\_{x \in \mathcal{X}} \delta(x)(S) f(a)(\{x\}) \end{aligned}$$

Lemma 5. >=> *is associative, left-distributive (but not right-distributive), has neutral element* δ *and preserves (sub-)probability and s-finite kernels.*

*Proof.* Remember that (f >=> g)(a)(S) = b∈B <sup>g</sup>(b)(S) <sup>f</sup>(a)(db). Left-distributivity follows from the properties of the Lebesgue integral in Lemma 19. Rightdistributivity does not necessarily hold because g<sup>1</sup> + g2(⊥) = g1(⊥) + g2(⊥). Associativity for f : A → B, g : B → C and h: C → D can be derived by

((f >=> g) >=> h)(a)(S) = f >=> g ;h (a)(S) +  x∈X δ(x)(S)(f >=> g)(a)({x}) = f;g + λa .λS . x∈X δ(x)(S )f(a )({x}) ;h (a)(S) +  x∈X δ(x)(S)(f >=> g)(a)({x}) = (f;g;h)(a)(S) + λa .λS . x∈X δ(x)(S )f(a )({x}) ;h (a)(S) =0((;)integrates over non-exception states) +  x∈X δ(x)(S)(f >=> g)(a)({x}) = (f;g;h)(a)(S) +  x∈X δ(x)(S) (f;g)(a)({x}) +  x-∈X δ(x )({x})f(a)({x }) = (f;g;h)(a)(S) +  x∈X δ(x)(S) (f;g)(a)({x}) + f(a)({x}) = (f;g;h)(a)(S) +  x∈X δ(x)(S)(f;λa .λS .g(a )(S ))(a)({x}) +  x∈X δ(x)(S)f(a)({x}) = (f;g;h)(a)(S) + f; λa .λS . x∈X δ(x)(S )g(a )({x}) (a)(S) +  δ(x)(S)f(a)({x})

$$\begin{aligned} &= \left( f; \left( g; h + \lambda a'. \lambda S'. \sum\_{x \in \mathcal{X}} \delta(x) (S') g(a') (\{x\}) \right) \right) (a) (S) + \sum\_{x \in \mathcal{X}} \delta(x) (S) f(a) (\{x\}) \\ &= \left( f; \left( g \gg > h \right) \right) (a) (S) + \sum\_{x \in \mathcal{X}} \delta(x) (S) f(a) (\{x\}) \\ &= (f \gg \gg (g \gg \gg h)) (a) (S) \end{aligned}$$

Here, we have used Lemma 4, left- and right-distributivity of (;).

To show that f >=> g preserves s-finite kernels, let f : A → B and g : B → C be s-finite kernels. Then, for sub-probability kernels <sup>f</sup>i,

$$\begin{aligned} (f \gg \gg g)(a)(S) &= (f;g)(a)(S) + \sum\_{x \in \mathcal{X}} \delta(x)(S)f(a)(\{x\}) \\ &= (f;g)(a)(S) + \sum\_{x \in \mathcal{X}} \sum\_{i \in \mathbb{N}} \delta(x)(S)f\_i(a)(\{x\}) \end{aligned}$$

Note that for each <sup>x</sup> ∈ X and <sup>i</sup> <sup>∈</sup> **<sup>N</sup>**, λa.λS.δ(x)(S)fi(a)({x}) is a sub-probability kernel. Thus, f >=> g is a sum of s-finite kernels and hence s-finite.

Proving that for sub-probability kernels f and g, f >=> g is also a (sub-) probability kernel is trivial, since we only need to show that (f >=> g)(a)(C)=1 (or ≤ 1).

Lemma 22. *Let* (A, ΣA) *and* (B,ΣB) *be measurable spaces. Let* <sup>f</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> [0,∞] *be measurable and* κ: A → B *be a sub-probability kernel. Then,* f : A → [0,∞] *defined by*

$$f'(a) := \int\_{b \in B} f(a, b) \kappa(a) (db)$$

*is measurable.*

*Proof.* See Theorem 20 of [30].

Lemma 23. × *and* × *preserve (sub-)probability kernels.*

*Proof.* Let κ: A → B and κ : A → C be (sub-)probability kernels. The fact that (κ × κ )(a)(·) for all a ∈ A is a (sub-)probability measure is inherited from Lemma 2. It remains to show that (κ×κ )(·)(S) is measurable for all <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>B×C , with

$$(\kappa \times \kappa')(a)(S) = \int\_{b \in B} \int\_{c \in C} [(b, c) \in S] \kappa'(a)(dc) \kappa(a)(db)$$

By Lemma 22, f : A×B → [0,∞] defined by f (a, b) = c∈C [(b, c) <sup>∈</sup> <sup>S</sup>]κ (a)(dc) is measurable, using the measurable function f : (A × B) × C → [0,∞] defined by f((a, b), c) = [(b, c) ∈ S]. Again by Lemma 22, b∈B c∈C [(b, c) <sup>∈</sup> S]κ (a)(dc)κ(a)(db) is measurable.

Proving that for (sub-)probability kernels κ: A → B and κ : A → C, κ×κ is a (sub-)probability kernel proceeds analogously.

Lemma 6. × *and* × *for kernels preserve (sub-)probability and s-finite kernels, are associative, left- and right-distributive.*

*Proof.* Associativity, left- and right-distributivity are inherited from respective properties of the product of measures established by Lemma 2. Sub-probability kernels are preserved by Lemma 23.

S-finite kernels are preserved because κ × κ = (- i∈**<sup>N</sup>** <sup>κ</sup>i) <sup>×</sup> ( - i∈**<sup>N</sup>** <sup>κ</sup> i - ) = i,j∈**<sup>N</sup>** <sup>κ</sup><sup>i</sup> <sup>×</sup> <sup>κ</sup> j (analogously for <sup>×</sup>).

### B Proofs for Semantics

Lemma 7. *For* Δ *as in the semantics of the while loop, and for each* σ *and each* <sup>S</sup>*, the limit* limn→∞ <sup>Δ</sup>n(-)(σ)(S) *exists.*

*Proof.* In general, 0 ≤ Δn(-)(σ)(S) ≤ 1. First, we restrict the allowed arguments for limn→∞ <sup>Δ</sup>n(-)(σ)(S) to only those S with -∈ S. We prove by induction that Δn+1(-) ≤ Δn(-), meaning ∀σ : ∀S : -<sup>∈</sup> <sup>S</sup> <sup>=</sup><sup>⇒</sup> <sup>Δ</sup>n+1(-)(σ)(S) ≤ Δn(-)(σ)(S). Hence, Δn(-) is monotone decreasing in n and lower bounded by 0, which means that the limit must exist.

As a base case, we have Δ<sup>1</sup>(-)(σ)(S) ≤ 1 = δ-(S) = Δ<sup>0</sup>(-)(σ)(S), because -∈ S. We proceed by induction with

$$\begin{split} \Delta^{n+1}(\mathsf{O})(\sigma)(S) &= \left(\delta \overline{\times} [e] \right) \underset{\delta}{\iff} \lambda(\sigma, b). \left\{ \begin{bmatrix} P \\ \delta(\sigma) \end{bmatrix} \begin{matrix} b \not\equiv \Delta^{n}(\mathsf{O}) \ b \not\equiv 0 \\ b = 0 \end{matrix} \right\}(\sigma)(S) \\ &\leq \left(\delta \overline{\times} [e] \underset{\Rightarrow}{\iff} \lambda(\sigma, b). \left\{ \begin{matrix} [P](\sigma) \ \gg = \Delta^{n-1}(\mathsf{O}) \ b \not\equiv 0 \\ \delta(\sigma) \end{matrix} \right\}(\sigma)(S) \\ &= \Delta^{n}(\mathsf{O})(\sigma)(S) \end{split} \right\}(\sigma)(S) \end{split}$$

In the second line, we have used the induction hypothesis. This application is valid because κ<sup>2</sup> ≤ κ <sup>2</sup> implies κ<sup>1</sup> >=> κ<sup>2</sup> ≤ κ<sup>1</sup> >=> κ <sup>2</sup> (Lemma 21).

We proceed analogously when we restrict the allowed arguments for the kernel limn→∞ <sup>Δ</sup>n(-)(σ)(S) to only those S with -<sup>∈</sup>/ <sup>S</sup>, proving <sup>Δ</sup>n+1(-) ≥ Δn(-) for that case.

Lemma 8. *In the absence of exception states, and using sub-probability kernels instead of distribution transformers, the definition of the semantics of the while loop from [23] is equivalent to ours.*

Definition 6. *In [23], Kozen shows a different way of defining the semantics of the while loop. In our notation, and in terms of probability kernels instead of distribution transformers, that definition becomes*

$$\left[\mathsf{while1}\mathsf{i}\mathsf{e}\ \{P\}\right] = \sup\_{n\in\mathbb{N}}\sum\_{k=0}^{n}\left(\left[\mathsf{if1}\mathsf{i}\mathsf{i}\mathsf{i}\mathsf{i}\mathsf{e}\right]\right) \mathrel{\geqslant} \Longrightarrow \left[P\right]\right)^{k} \mathrel{\geqslant} \mathrel{\geqslant} \left[\mathsf{if1}\mathsf{i}\mathsf{i}\mathsf{i}\mathsf{e}\left(\neg e\right)\right]$$

*Here, exponentiation is in terms of Kleisli composition, i.e.* κ<sup>0</sup> = δ *and* κn+1 = κ >=> κn*. The sum and limit are meant pointwise. Furthermore, we define filter by the following expression (note that* [[*filter*(e)]] *and* [[*filter*(¬e)]] *are only sub-probability kernels, not probability kernels).*

$$\begin{aligned} \[\mathsf{filt}\,\mathsf{ter}(e)\} &= \delta \overline{\times} [e] \; \rhd \Rightarrow \lambda(\sigma, b) . \begin{cases} \delta(\sigma) \; b \neq 0 \\ \mathbf{0} & b = 0 \end{cases} \\\ [\mathsf{filt}\,\mathsf{ter}(\neg e)] &= \delta \overline{\times} [e] \; \rhd \Rightarrow \lambda(\sigma, b) . \begin{cases} \delta(\sigma) \; b = 0 \\ \mathbf{0} & b \neq 0 \end{cases} \end{aligned}$$

To justify Lemma 8, we prove the more formal Lemma 24. Note that in the presence of exceptions (e.g. <sup>P</sup> is just **assert**(0)), Definition <sup>6</sup> does not make sense, because if

Lemma 24. *For all* S *with* S ∩ X = ∅

$$\left(\sum\_{k=0}^{n} \left(\left[\mathsf{f11\mathsf{ter}}(e)\right] >> \left[\mathsf{I}P\right]\right)^{k} >> \left[\mathsf{f11\mathsf{ter}}(\neg e)\right]\right)(\sigma)(S) = \Delta^{n+1}(\mathsf{O})(\sigma)(S)$$

*Proof.* For n = 0, we have

$$\begin{split} & \left( \sum\_{k=0}^{0} \left( [\mathtt{filter}(e)] \asymp \triangleright [P] \right)^{k} \asymp \triangleright [\mathtt{filter}(\neg e)] \right) (\sigma)(S) \\ &= \left( \left( [\mathtt{filter}(e)] \asymp \triangleright [P] \right)^{0} \asymp \triangleright [\mathtt{filter}(\neg e)] \right) (\sigma)(S) \\ &= \left( \delta \asymp \triangleright [\mathtt{filter}(\neg e)] \right) (\sigma)(S) \\ &= [\mathtt{filter}(\neg e)] (\sigma)(S) \\ &= \left( \delta \overline{\times} [e] \asymp \triangleright \lambda(\sigma', b) . \left\{ \begin{array}{l} \delta(\sigma') \ b = 0 \\ \mathbf{0} \end{array} \right\} (\sigma)(S) \\ &= \left( \delta \overline{\times} [e] \asymp \triangleright \lambda(\sigma', b) . \left\{ \begin{array}{l} \delta(\sigma') \ b = 0 \\ \mathbf{0} \left( \sigma' \right) \ b \neq 0 \end{array} \right\} (\sigma)(S) \\ &= \left( \delta \overline{\times} [e] \asymp \triangleright \lambda(\sigma', b) . \left\{ \begin{array}{l} \delta(\sigma') \qquad b = 0 \\ ([P] \ \asymp \odot \mathbf{O})(\sigma') \ b \neq 0 \end{array} \right\} (\sigma)(S) \\ &= \Delta^{1}(\mathbf{O}) \\ &= \Delta^{1}(\mathbf{O}) \end{split}$$

For n ≥ 0, we have

<sup>n</sup> +1 k=0 [[**filter**(e)]] >=> [[P ]]<sup>k</sup> >=> [[**filter**(¬e)]] (σ)(S) = <sup>n</sup> k=0 [[**filter**(e)]] >=> [[P ]]k+1 + ([[**filter**(e)]] >=> P ) 0 >=> [[**filter**(¬e)]] (σ)(S) = <sup>n</sup> k=0 [[**filter**(e)]] >=> [[P ]]k+1 + δ >=> [[**filter**(¬e)]] (σ)(S) = <sup>n</sup> k=0 [[**filter**(e)]] >=> [[P ]]k+1 >=> [[**filter**(¬e)]] (σ)(S) since S ∩ X <sup>=</sup> <sup>∅</sup> + (δ >=<sup>&</sup>gt; [[**filter**(¬e)]]) (σ)(S) = <sup>n</sup> k=0 [[**filter**(e)]] >=> [[P ]]k+1 >=> [[**filter**(¬e)]] (σ)(S) + [[**filter**(¬e)]](σ)(S) = [[**filter**(e)]] >=> [[P ]] >=> <sup>n</sup> k=0 ([[**filter**(e)]] >=> [[P ]])<sup>k</sup> >=> [[**filter**(¬e)]] (σ)(S) + [[**filter**(¬e)]](σ)(S) = [[**filter**(e)]] >=> [[P ]] >=> <sup>n</sup> k=0 ([[**filter**(e)]] >=> [[P ]])<sup>k</sup> >=> [[**filter**(¬e)]] (σ)(S) + [[**filter**(¬e)]](σ)(S) = [[**filter**(e)]] >=> [[P ]] >=> Δn+1(-) (σ)(S) + [[**filter**(¬e)]](σ)(S) = <sup>δ</sup>×[[e]] >=> λ(σ- , b). [[P ]](σ- ) <sup>=</sup> Δn+1(-) b = 0 δ(σ- ) b = 0 (σ)(S) <sup>=</sup> Δn+2(-)(σ)(S)

In particular, have have used that left-distributivity does hold in this case since S ∩ X = ∅.

### C Probability Kernel

In the following, we list lemmas that are crucial to prove Theorem 2 (restated for convenience).

Theorem 2. *The semantics of each expression* [[e]] *and statement* [[P]] *is indeed a probability kernel.*

Lemma 25. *Any measurable function* f : A → [0,∞] *can be viewed as an sfinite kernel* f : A → **1***, defined by* f(x)(∅)=0 *and* f(x)(**1**) = f(x)*.*

*Proof.* We prove that f is an s-finite kernel. Let A<sup>∞</sup> := {x ∈ A | f(x) = ∞}. Since f is measurable, the set A<sup>∞</sup> must be measurable. f(x)(S) = - i∈**N**[<sup>x</sup> <sup>∈</sup> A∞][() ∈ S] + - i∈**<sup>N</sup>** <sup>f</sup>(x)[<sup>i</sup> <sup>≤</sup> <sup>f</sup>(x) < i + 1][() <sup>∈</sup> <sup>S</sup>], which is a sum of finite kernels because the sets <sup>A</sup><sup>∞</sup> and {<sup>x</sup> <sup>|</sup> <sup>i</sup> <sup>≤</sup> <sup>f</sup>(x) < i + 1} <sup>=</sup> <sup>f</sup> <sup>−</sup>1([i, i + 1)) are measurable. Note that any sum of finite kernels can be rewritten as a sum of sub-probability kernels.

Lemma 26. *Let* κ : X → Y *and* κ : X → Y *be kernels, and* f : X → **R** *measurable. Then,*

$$\kappa(x)(S) = \begin{cases} \kappa'(x)(S) & \text{if } f(x) = 0\\ \kappa''(x)(S) & \text{otherwise} \end{cases}$$

*is a kernel.*

*Proof.* Let f=0(x) := [f(x) = 0], f=0(x) := [f(x) = 0]. Then, κ = f=0 × κ + f=0 × κ. Viewing f=0 and f=0 as kernels X → **1** immediately gives the desired result.

Lemma 27. *Let* (A, ΣA) *and* (B,ΣB) *be measurable spaces. Let* {Ai}i∈I *be a partition of* A *into measurable sets, for a countable set of indices* I*. Consider a function* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup>*. If* <sup>f</sup><sup>|</sup>A<sup>i</sup> : <sup>A</sup><sup>i</sup> <sup>→</sup> <sup>B</sup> *is measurable for each* <sup>i</sup> ∈ I*, then* <sup>f</sup> *is measurable.*

Lemma 28. *Let* f : A → B *be measurable. Then* κ: A → B *with* κ(a) = δ(f(a)) *is a kernel.*

The following lemma is important to show that the semantics of the while loop is a probability kernel.

Lemma 29. *Suppose* {κn}n∈**<sup>N</sup>** *is a sequence of (sub-)probability kernels* <sup>A</sup> → <sup>B</sup>*. Then, if the limit* <sup>κ</sup> = limn→∞ <sup>κ</sup>n *exists, it is also a (sub-)probability kernel. Here, the limit is pointwise in the sense* <sup>∀</sup><sup>a</sup> <sup>∈</sup> <sup>A</sup>: <sup>∀</sup><sup>S</sup> <sup>∈</sup> <sup>Σ</sup>B : <sup>κ</sup>(a, S) = limn→∞ <sup>κ</sup>n(a)(S)*.*

*Proof.* For every a ∈ A, κ(a, ·) is a measure, because the pointwise limit of finite measures is a measure. For every <sup>S</sup> <sup>∈</sup> <sup>Σ</sup>B, <sup>κ</sup>(·, S) is measurable, because the pointwise limit of measurable functions <sup>f</sup>n : <sup>A</sup> <sup>→</sup> **<sup>R</sup>** (with <sup>B</sup> as the <sup>σ</sup>-algebra on **R**) is measurable.

### D Proofs for Consequences

In this section, we provide some proofs of consequences of our semantics, explained in Sect. 5.

Lemma 9. *For function* <sup>F</sup>(){**while** <sup>1</sup> {**skip**}; **return** <sup>0</sup>}*,*

$$\frac{1}{0} + F() \not\equiv F() + \frac{1}{0}$$

*Proof.* If we evaluate <sup>1</sup> <sup>0</sup> first, we will only have weight on ⊥.

$$\begin{aligned} &\left[\frac{1}{0} + F()\right] \\ &= \left[\frac{1}{0}\right] \overline{\times} [F()] \gg \ge \lambda(x, y). \delta(x + y) \\ &= \delta(\perp) \overline{\times} [F()] \gg \gg \lambda(x, y). \delta(x + y) \\ &= \delta(\perp) \gg \gg \lambda(x, y). \delta(x + y) \\ &= \delta(\perp) \end{aligned}$$

If instead, we first evaluate F(), we only have weight on -, by an analogous calculation.

Lemma 10. *If* [[e1]](σ)(X ) = [[e2]](σ)(X )=0 *for all* σ*, then* e<sup>1</sup> ⊕ e<sup>2</sup> e<sup>2</sup> ⊕ e1*, for any commutative operator* ⊕*.*

*Proof*

$$\begin{aligned} [e\_1 \oplus e\_2](\sigma)(S) &= [e\_1] \overline{\times} [e\_2] \ \rangle \ \rangle \ \geqslant \ \lambda(x, y) \ \delta(x \oplus y) \\ &= \int\_{z \in \overline{\mathbb{R} \times \mathbb{R}}} \overline{\lambda(x, y) \cdot \delta(x \oplus y)}(z) (S) ([e\_1] \overline{\times} [e\_2])(\sigma)(dz) \\ &= \int\_{(x, y) \in \mathbb{R} \times \mathbb{R}} \delta(x \oplus y)(S) ([e\_1] \times [e\_2])(\sigma)(d(x, y)) \\ &= \int\_{(y, x) \in \mathbb{R} \times \mathbb{R}} \delta(y \oplus x)(S) ([e\_2] \times [e\_1])(\sigma)(d(y, x)) \\ &= [e\_2 \oplus e\_1](\sigma)(S) \end{aligned}$$

Here, we crucially rely on the absence of exceptions (for the third equality) and Fubini's Theorem (for the fourth equality).

Lemma 11. e<sup>1</sup> ⊕ (e<sup>2</sup> ⊕ e3) (e<sup>1</sup> ⊕ e2) ⊕ e3*, for any associative operator* ⊕*.* *Proof.* The important steps of the proof are the following.

$$\begin{aligned} \left[e\_1 \oplus (e\_2 \oplus e\_3)\right] &= \left[e\_1\right] \overline{\times} \left[e\_2 \oplus e\_3\right] \gg \implies \lambda(x,s).\delta(x \oplus s) \\ &= \left[e\_1\right] \overline{\times} \left(\left[e\_2\right] \overline{\times} \left[e\_3\right] \gg \implies \lambda(y,z).\delta(y \oplus z)\right) \gg \implies \lambda(x,s).\delta(x \oplus s) \\ &= \left[e\_1\right] \overline{\times} \left(\left[e\_2\right] \overline{\times} \left[e\_3\right]\right) \gg \implies \lambda(x,\left(y,z\right)).\delta(x \oplus y \oplus z) \\ &= \left(\left[e\_1\right] \overline{\times} \left[e\_2\right]\right) \overline{\times} \left[e\_3\right] \gg \implies \lambda((x,y),z).\delta(x \oplus y \oplus z) \\ &= \left[\left(e\_1 \oplus e\_2\right) \oplus e\_3\right] \end{aligned}$$

Here, we make crucial use of associativity for the lifted product of measures in Lemma 6.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **How long, O Bayesian network, will I sample thee? A program analysis perspective on expected sampling times**

Kevin Batz(B) , Benjamin Lucien Kaminski(B) , Joost-Pieter Katoen(B) , and Christoph Matheja(B)

> RWTH Aachen University, Aachen, Germany kevin.batz@rwth-aachen.de, {benjamin.kaminski,katoen,matheja}@cs.rwth-aachen.de

**Abstract.** Bayesian networks (BNs) are probabilistic graphical models for describing complex joint probability distributions. The main problem for BNs is inference: Determine the probability of an event given observed evidence. Since exact inference is often infeasible for large BNs, popular approximate inference methods rely on sampling.

We study the problem of determining the expected time to obtain a single valid sample from a BN. To this end, we translate the BN together with observations into a probabilistic program. We provide proof rules that yield the exact expected runtime of this program in a fully automated fashion. We implemented our approach and successfully analyzed various real–world BNs taken from the Bayesian network repository.

**Keywords:** Probabilistic programs · Expected runtimes Weakest preconditions · Program verification

### **1 Introduction**

*Bayesian networks* (BNs) are *probabilistic graphical models* representing joint probability distributions of sets of random variables with conditional dependencies. Graphical models are a popular and appealing modeling formalism, as they allow to succinctly represent complex distributions in a human–readable way. BNs have been intensively studied at least since 1985 [43] and have a wide range of applications including machine learning [24], speech recognition [50], sports betting [11], gene regulatory networks [18], diagnosis of diseases [27], and finance [39].

*Probabilistic programs* are programs with the key ability to draw values at random. Seminal papers by Kozen from the 1980s consider formal semantics [32] as well as initial work on verification [33,47]. McIver and Morgan [35] build on this work to further weakest–precondition style verification for imperative probabilistic programs.

c The Author(s) 2018 A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 186–213, 2018. https://doi.org/10.1007/978-3-319-89884-1\_7

The interest in probabilistic programs has been rapidly growing in recent years [20,23]. Part of the reason for this d´ej`a vu is their use for representing probabilistic graphical models [31] such as BNs. The full potential of modern probabilistic programming languages like Anglican [48], Church [21], Figaro [44], R2 [40], or Tabular [22] is that they enable rapid prototyping and obviate the need to manually provide inference methods tailored to an individual model.

*Probabilistic inference* is the problem of determining the probability of an event given observed evidence. It is a major problem for both BNs and probabilistic programs, and has been subject to intense investigations by both theoreticians and practitioners for more than three decades; see [31] for a survey. In particular, it has been shown that for probabilistic programs exact inference is highly undecidable [28], while for BNs both *exact inference* as well as *approximate inference* to an arbitrary precision are NP–hard [12,13]. In light of these complexity– theoretical hurdles, a popular way to analyze probabilistic graphical models as well as probabilistic programs is to gather a large number of independent and identically distributed (i.i.d. for short) samples and then do statistical reasoning on these samples. In fact, all of the aforementioned probabilistic programming languages support sampling based inference methods.

*Rejection sampling* is a fundamental approach to obtain valid samples from BNs with observed evidence. In a nutshell, this method first samples from the joint (unconditional) distribution of the BN. If the sample complies with all evidence, it is valid and accepted; otherwise it is rejected and one has to resample.

Apart from rejection sampling, there are more sophisticated sampling techniques, which mainly fall in two categories: Markov Chain Monte Carlo (MCMC) and importance sampling. But while MCMC requires heavy hand–tuning and suffers from slow convergence rates on real–world instances [31, Chapter 12.3], virtually all variants of importance sampling rely again on rejection sampling [31,49].

A major problem with rejection sampling is that for poorly conditioned data, this approach might have to reject and resample very often in order to obtain just a single accepting sample. Even worse, being poorly conditioned need not be immediately evident for a given BN, let alone a probabilistic program. In fact, Gordon et al. [23, p. 177] point out that

"the main challenge in this setting [i.e. sampling based approaches] is that many samples that are generated during execution are ultimately rejected for not satisfying the observations."

If too many samples are rejected, the expected sampling time grows so large that sampling becomes infeasible. The expected sampling time of a BN is therefore a key figure for deciding whether sampling based inference is the method of choice.

*How Long, O Bayesian Network, will I Sample Thee?* More precisely, we use techniques from program verification to give an answer to the following question:

Given a Bayesian network with observed evidence, how long does it take in expectation to obtain a *single* sample that satisfies the observations?

**Fig. 1.** A simple Bayesian network.

As an example, consider the BN in Fig. 1 which consists of just three nodes (random variables) that can each assume values 0 or 1. Each node X comes with a conditional probability table determining the probability of X assuming some value given the values of all nodes Y that X depends on (i.e. X has an incoming edge from Y ), see [3, Appendix A.1] for detailed calculations. For instance, the probability that G assumes value 0, given that S and R are both assume 1, is 0.2. Note that this BN is paramterized by a ∈ [0, 1].

Now, assume that our observed evidence is the event G = 0 and we apply rejection sampling to obtain *one* accepting sample from this BN. Then our approach will yield that a rejection sampling algorithm will, on average, require

$$\frac{200a^2 - 40a - 460}{89a^2 - 69a - 21}$$

guard evaluations, random assignments, etc. until it obtains a single sample that complies with the observation G = 0 (the underlying runtime model is discussed in detail in Sect. 3.3). By examination of this function, we see that for large ranges of values of a the BN is rather well–behaved: For a ∈ [0.08, 0.78] the expected sampling time stays below 18. Above a = 0.95 the expected sampling time starts to grow rapidly up to 300.

While 300 is still moderate, we will see later that expected sampling times of real–world BNs can be much larger. For some BNs, the expected sampling time even exceeded 10<sup>18</sup>, rendering sampling based methods infeasible. In this case, exact inference (despite NP–hardness) was a viable alternative (see Sect. 6).

*Our Approach.* We apply weakest precondition style reasoning a l´a McIver and Morgan [35] and Kaminski et al. [30] to analyze both expected outcomes and *expected runtimes* (ERT) of a *syntactic fragment of* pGCL, which we call the *Bayesian Network Language* (BNL). Note that since BNL is a syntactic fragment of pGCL, every BNL program is a pGCL program but *not vice versa*. The main restriction of BNL is that (in contrast to pGCL) loops are of a special form that prohibits undesired data flow across multiple loop iterations. While this restriction renders BNL incapable of, for instance, counting the number of loop iterations<sup>1</sup>, BNL is expressive enough to encode Bayesian networks with observed evidence.

For BNL, we develop dedicated proof rules to determine *exact* expected values and the *exact* ERT of any BNL program, including loops, without any user– supplied data, such as invariants [30,35], ranking or metering functions [19], (super)martingales [8–10], etc.

As a central notion behind these rules, we introduce f–*i.i.d.–ness* of probabilistic loops, a concept closely related to stochastic independence, that allows us to *rule out undesired parts of the data flow across loop iterations*. Furthermore, we show how every BN with observations is translated into a BNLprogram, such that


As a consequence, exact expected sampling times of BNs can be inferred by means of weakest precondition reasoning in a fully automated fashion. This can be seen as a first step towards formally evaluating the quality of a plethora of different sampling methods (cf. [31,49]) on source code level.

*Contributions.* To summarize, our main contributions are as follows:


*Outline.* We discuss related work in Sect. 2. Syntax and semantics of the probabilistic programming language pGCL are presented in Sect. 3. Our proof rules are introduced in Sect. 4 and applied to BNs in Sect. 5. Section 6 reports on experimental results and Sect. 7 concludes.

<sup>1</sup> An example of a program that is *not* expressible in BNL is given in Example 1.

### **2 Related Work**

While various techniques for formal reasoning about runtimes and expected outcomes of probabilistic programs have been developed, e.g. [6,7,17,25,38], none of them explicitly apply formal methods to reason about Bayesian networks on source code level. In the following, we focus on approaches close to our work.

*Weakest Preexpectation Calculus.* Our approach builds upon the expected runtime calculus [30], which is itself based on work by Kozen [32,33] and McIver and Morgan [35]. In contrast to [30], we develop specialized proof rules for a clearly specified program fragment *without* requiring user–supplied invariants. Since finding invariants often requires heavy calculations, our proof rules contribute towards simplifying and automating verification of probabilistic programs.

*Ranking Supermartingales.* Reasoning about almost–sure termination is often based on ranking (super)martingales (cf. [8,10]). In particular, Chatterjee et al. [9] consider the class of affine probabilistic programs for which linear ranking supermartingales exist (Lrapp); thus proving (positive<sup>2</sup>) almost–sure termination for all programs within this class. They also present a doubly–exponential algorithm to approximate ERTs of Lrapp programs. While all BNL programs lie within Lrapp, our proof rules yield *exact* ERTs as *expectations* (thus allowing for compositional proofs), in contrast to a single number for a fixed initial state.

*Bayesian Networks and Probabilistic Programs.* Bayesian networks are a—if not the most—popular probabilistic graphical model (cf. [4,31] for details) for reasoning about conditional probabilities. They are closely tied to (a fragment of) probabilistic programs. For example, Infer.NET [36] performs inference by compiling a probabilistic program into a Bayesian network. While correspondences between probabilistic graphical models, such as BNs, have been considered in the literature [21,23,37], we are not aware of a formal soudness proof for a translation from classical BNs into probabilistic programs including conditioning.

Conversely, some probabilistic programming languages such as Church [21], Stan [26], and R2 [40] directly perform inference on the program level using sampling techniques similar to those developed for Bayesian networks. Our approach is a step towards understanding sampling based approaches formally: We obtain the exact expected runtime required to generate a sample that satisfies all observations. This may ultimately be used to evaluate the quality of a plethora of proposed sampling methods for Bayesian inference (cf. [31,49]).

### **3 Probabilistic Programs**

We briefly present the probabilistic programming language that is used throughout this paper. Since our approach is embedded into weakest-precondition style approaches, we also recap calculi for reasoning about both expected outcomes and expected runtimes of probabilistic programs.

<sup>2</sup> Positive almost–sure termination means termination in finite expected time [5].

#### **3.1 The Probabilistic Guarded Command Language**

We enhance Dijkstra's Guarded Command Language [14,15] by a probabilistic construct, namely a random assignment. We thereby obtain a *probabilistic Guarded Command Language* (for a closely related language, see [35]).

Let Vars be a finite set of *program variables*. Moreover, let Q be the set of rational numbers, and let D (Q) be the set of discrete probability distributions over Q. The set of *program states* is given by Σ = { σ | σ : Vars → Q }.

A *distribution expression* μ is a function of type μ: Σ → D (Q) that takes a program state and maps it to a probability distribution on values from Q. We denote by <sup>μ</sup>σ the distribution obtained from applying <sup>σ</sup> to <sup>μ</sup>.

The probabilistic guarded command language (pGCL) is given by the grammar


where x ∈ Vars is a program variable, μ is a distribution expression, and ϕ is a Boolean expression guarding a choice or a loop. A pGCL program that contains neither diverge, nor while, nor repeat <sup>−</sup> until loops is called loop–free.

For σ ∈ Σ and an arithmetical expression E over Vars, we denote by σ(E) the evaluation of E in σ, i.e. the value that is obtained by evaluating E after replacing any occurrence of any program variable x in E by the value σ(x). Analogously, we denote by σ(ϕ) the evaluation of a guard ϕ in state σ to either true or false. Furthermore, for a value v ∈ Q we write σ [x → v] to indicate that we set program variable x to value v in program state σ, i.e.<sup>3</sup>

$$
\sigma \left[ x \mapsto v \right] \;= \; \lambda \, y \\
\bullet \; \begin{cases} v, & \text{if } y = x \\ \sigma(y), & \text{if } y \neq x \end{cases}
$$

We use the Iverson bracket notation to associate with each guard its according indicator function. Formally, the Iverson bracket [ϕ] of ϕ is thus defined as the function [ϕ] = λ σ**.** <sup>σ</sup>(ϕ).

Let us briefly go over the pGCL constructs and their effects: skip does not alter the current program state. The program diverge is an infinite busy loop, thus takes infinite time to execute. It returns no final state whatsoever.

The random assignment x :≈ μ is (a) the only construct that can actually alter the program state and (b) the only construct that may introduce random

<sup>3</sup> We use <sup>λ</sup>–expressions to construct functions: Function λX**.** applied to an argument α evaluates to in which every occurrence of X is replaced by α.

behavior into the computation. It takes the current program state σ, then *samples* a value <sup>v</sup> from probability distribution <sup>μ</sup>σ, and then assigns <sup>v</sup> to program variable x. An example of a random assignment is

$$\{x \mathrel{\approx} 1/2 \cdot \langle 5 \rangle + 1/6 \cdot \langle y+1 \rangle + 1/3 \cdot \langle y-1 \rangle \ .\}$$

If the current program state is σ, then the program state is altered to either σ [x → 5] with probability <sup>1</sup>/2, or to σ [x → σ(y) + 1] with probability <sup>1</sup>/6, or to σ [x → σ(y) − 1] with probability <sup>1</sup>/3. The remainder of the pGCL constructs are standard programming language constructs.

In general, a pGCL program C is executed on an input state and yields a *probability distribution* over final states due to possibly occurring random assignments inside of C. We denote that resulting distribution by -Cσ. Strictly speaking, programs can yield *subdistributions*, i.e. probability distributions whose total mass may be below 1. The "missing"probability mass represents the probability of nontermination. Let us conclude our presentation of pGCLwith an example:

*Example 1 (Geometric Loop).* Consider the program C*geo* given by

$$\begin{aligned} x &\approx 0; & c &\approx 1/2 \cdot \langle 0 \rangle + 1/2 \cdot \langle 1 \rangle; \\ \mathtt{while} \, \mathtt{e} \, (c &= 1) \left\{ x : \approx x + 1; \, c &\approx 1/2 \cdot \langle 0 \rangle + 1/2 \cdot \langle 1 \rangle \right\} \end{aligned}$$

This program basically keeps flipping coins until it flips, say, heads (c = 0). In x it counts the number of unsuccessful trials.<sup>4</sup> In effect, it almost surely sets c to 0 and moreover it establishes a geometric distribution on x. The resulting distribution is given by

$$\left[\left\|C\_{geo}\right\|\_{\sigma}(\tau)\right] = \sum\_{n=0}^{\omega} \left[\tau = \sigma\left[c, x \mapsto 0, n\right]\right] \cdot \frac{1}{2^{n+1}}\,. \tag{7}$$

#### **3.2 The Weakest Preexpectation Transformer**

We now present the weakest preexpectation transformer wp for reasoning about expected outcomes of executing probabilistic programs in the style of McIver and Morgan [35]. Given a random variable f mapping program states to reals, it allows us to reason about the expected value of f after executing a probabilistic program on a given state.

**Expectations.** The random variables the wp transformer acts upon are taken from a set of so-called expectations, a term coined by McIver and Morgan [35]:

<sup>4</sup> This counting is also the reason that C*geo* is an example of a program that is not expressible in our BNL language that we present later.

**Definition 1 (Expectations).** *The set of expectations* E *is defined as*

E = f <sup>f</sup> : <sup>Σ</sup> <sup>→</sup> <sup>R</sup><sup>∞</sup> ≥0 .

*We will use the notation* f[x/E] *to indicate the* replacement *of every occurrence of* x *in* f *by* E*. Since* x*, however, does not actually occur in* f*, we more formally define* <sup>f</sup>[x/E] = λσ*.* <sup>f</sup>(<sup>σ</sup> [<sup>x</sup> → <sup>σ</sup>(E)])*.*

*A complete partial order* ≤ *on* E *is obtained by point–wise lifting the canonical total order on* R<sup>∞</sup> <sup>≥</sup><sup>0</sup>*, i.e.*

$$f\_1 \preceq f\_2 \quad \text{iff} \quad \forall \sigma \in \Sigma \colon \quad f\_1(\sigma) \le \ f\_2(\sigma) \dots$$

*Its least element is given by* λσ*.* <sup>0</sup> *which we (by slight abuse of notation) also denote by* 0*. Suprema are constructed pointwise, i.e. for* S ⊆ E *the supremum* sup <sup>S</sup> *is given by* sup <sup>S</sup> <sup>=</sup> λσ*.* supf∈S <sup>f</sup>(σ)*.*

We allow expectations to map only to positive reals, so that we have a complete partial order readily available, which would not be the case for expectations of type Σ → R ∪ {−∞, +∞}. A wp calculus that *can* handle expectations of such type needs more technical machinery and cannot make use of this underlying natural partial order [29]. Since we want to reason about ERTs which are by nature non–negative, we will not need such complicated calculi.

Notice that we use a slightly different definition of expectations than McIver and Morgan [35], as we allow for *unbounded* expectations, whereas [35] requires that expectations are *bounded*. This however would prevent us from capturing ERTs, which are potentially unbounded.

**Expectation Transformers.** For reasoning about the expected value of f ∈ E after execution of C, we employ a backward–moving weakest preexpectation transformer wp-C : E → E, that maps a *postexpectation* f ∈ E to a *preexpectation* wp -C (f) ∈ E, such that wp -C (f) (σ) is the expected value of f after executing C on initial state σ. Formally, if C executed on input σ yields final distribution -Cσ, then the *weakest preexpectation* wp -C (f) *of* C *with respect to postexpectation* f *is given by*

$$\text{wp}\left[\![C]\!\right](f)\left(\sigma\right) = \int\_{\Sigma} f \, d\left[\![C]\!\right]\_{\sigma} \,, \tag{1}$$

where we denote by A h dν the expected value of a random variable <sup>h</sup>: <sup>A</sup> <sup>→</sup> <sup>R</sup><sup>∞</sup> ≥0 with respect to a probability distribution ν : A → [0, 1]. Weakest preexpectations can be defined in a very systematic way:

**Definition 2 (The wp Transformer** [35]**).** *The weakest preexpectation transformer* wp: pGCL → E → E *is defined by induction on all* pGCL *programs according to the rules in Table 1. We call* <sup>F</sup>f (X)=[¬ϕ] · <sup>f</sup> + [ϕ] · wp -C (X) *the* wp*–* characteristic functional *of the loop* while (ϕ) {C} *with respect to postexpectation* <sup>f</sup>*. For a given* wp*–characteristic function* <sup>F</sup>f *, we call the sequence* {<sup>F</sup> <sup>n</sup> f (0)}n∈<sup>N</sup> *the* orbit of <sup>F</sup>f *.*


**Table 1.** Rules for the wp–transformer.

Let us briefly go over the definitions in Table 1: For skip the program state is not altered and thus the expected value of <sup>f</sup> is just <sup>f</sup>. The program diverge will never yield any final state. The distribution over the final states yielded by diverge is thus the null distribution <sup>ν</sup>0(<sup>τ</sup> ) = 0, that assigns probability 0 to *every* state. Consequently, the expected value of <sup>f</sup> after execution of diverge is given by Σ f dν<sup>0</sup> <sup>=</sup> τ∈Σ <sup>0</sup> · <sup>f</sup>(<sup>τ</sup> ) = 0.

The rule for the random assignment x :≈ μ is a bit more technical: Let the current program state be σ. Then for every value v ∈ Q, the random assignment assigns <sup>v</sup> to <sup>x</sup> with probability <sup>μ</sup>σ(v), where <sup>σ</sup> is the current program state. The value of f after assigning v to x is f(σ [x → v]) = f[x/v](σ) and therefore the expected value of f after executing the random assignment is given by

$$\sum\_{v \in \mathbb{Q}} \mu\_{\sigma}(v) \cdot f[x/v](\sigma) \, = \int\_{\mathbb{Q}} \left(\lambda v \bullet f[x/v](\sigma)\right) \, d\mu\_{\sigma} \, .$$

Expressed as a function of σ, the latter yields precisely the definition in Table 1.

The definition for the conditional choice if (ϕ) {C1} else {C2} is not surprising: if the current state satisfies ϕ, we have to opt for the weakest preexpectation of C1, whereas if it does not satisfy ϕ, we have to choose the weakest preexpectation of C2. This yields precisely the definition in Table 1.

The definition for the sequential composition C1; C<sup>2</sup> is also straightforward: We first determine wp -C2 (f) to obtain the expected value of f after executing C2. Then we mentally prepend the program C<sup>2</sup> by C<sup>1</sup> and therefore determine the expected value of wp -C2 (f) after executing C1. This gives the weakest preexpectation of C1; C<sup>2</sup> with respect to postexpectation f.

The definition for the while loop makes use of a least fixed point, which is a standard construction in program semantics. Intuitively, the fixed point iteration of the wp–characteristic functional, given by 0, Ff (0), F<sup>2</sup> f (0), F<sup>3</sup> f (0), ..., corresponds to the portion the expected value of f after termination of the loop, that can be collected within at most 0, 1, 2, 3, ... loop guard evaluations. The Kleene Fixed Point Theorem [34] ensures that this iteration converges to the least fixed point, i.e.

$$\sup\_{n \in \mathbb{N}} F\_f^n(0) = \text{lfp } F\_f = \text{wp} \left[ \text{whi1e} \left( \varphi \right) \{ C \} \right](f) \dots$$

By inspection of the above equality, we see that the least fixed point is exactly the construct that we want for while loops, since supn∈<sup>N</sup> <sup>F</sup> <sup>n</sup> f (0) in principle allows the loop to run for any number of iterations, which captures precisely the semantics of a while loop, where the number of loop iterations is—in contrast to e.g. for loops—not determined upfront.

Finally, since repeat {C} until (ϕ) is syntactic sugar for <sup>C</sup>; while (ϕ) {C}, we simply define the weakest preexpectation of the former as the weakest preexpectation of the latter. Let us conclude our study of the effects of the wp transformer by means of an example:

*Example 2.* Consider the following program C:

$$\begin{aligned} c &\approx \mathbf{1} \langle 3 \cdot \langle 0 \rangle + 2 \langle 3 \cdot \langle 1 \rangle \rangle; \\ \mathbf{if} \,(c=0) &\{ x \approx 1/2 \cdot \langle 5 \rangle + 1/6 \cdot \langle y+1 \rangle + 1/3 \cdot \langle y-1 \rangle \} \mathbf{e1sse \{skip\}}. \end{aligned}$$

Say we wish to reason about the expected value of x + c after execution of the above program. We can do so by calculating wp -C (x + c) using the rules in Table 1. This calculation in the end yields wp -C (x + c) = <sup>3</sup>y+26/<sup>18</sup> The expected valuation of the expression x + c after executing C is thus <sup>3</sup>y+26/18. Note that x + c can be thought of as an expression that is evaluated in the final states after execution, whereas <sup>3</sup>y+26/<sup>18</sup> must be evaluated in the initial state before execution of C. 

**Healthiness Conditions of wp.** The wp transformer enjoys some useful properties, sometimes called *healthiness conditions* [35]. Two of these healthiness conditions that we will heavily make use of are given below:

**Theorem 1 (Healthiness Conditions for the wp Transformer** [35]**).** *For all* C ∈ pGCL*,* f1, f<sup>2</sup> ∈ E*, and* a ∈ R≥<sup>0</sup>*, the following holds:*


#### **3.3 The Expected Runtime Transformer**

While for deterministic programs we can speak of *the* runtime of a program on a given input, the situation is different for probabilistic programs: For those we instead have to speak of the *expected runtime* (ERT). Notice that the ERT can be finite (even constant) while the program may still admit infinite executions. An example of this is the geometric loop in Example 1.

A wp–like transformer designed specifically for reasoning about ERTs is the ert transformer [30]. Like wp, it is of type ert-C : E → E and it can be shown that


**Table 2.** Rules for the ert–transformer.

ert -C (0) (σ) is precisely the *expected runtime of executing* C *on input* σ. More generally, if f : Σ → R<sup>∞</sup> <sup>≥</sup><sup>0</sup> measures the time that is needed after executing <sup>C</sup> (thus f is evaluated in the final states after termination of C), then ert -C (f) (σ) is the expected time that is needed to run C on input σ and then let time f pass. For a more in–depth treatment of the ert transformer, see [30, Sect. 3]. The transformer is defined as follows:

**Definition 3 (The ert Transformer** [30]**).** *The expected runtime transformer* ert: pGCL → E → E *is defined by induction on all* pGCL *programs according to the rules given in Table 2. We call* <sup>F</sup>f (X) = 1+[¬ϕ]·<sup>f</sup> +[ϕ]·wp -C (X) *the* ert*–* characteristic functional *of the loop* while (ϕ) {C} *with respect to postexpectation* <sup>f</sup>*. As with* wp*, for a given* ert*–characteristic function* <sup>F</sup>f *, we call the sequence* {F <sup>n</sup> f (0)}n∈<sup>N</sup> *the* orbit of <sup>F</sup><sup>f</sup> *. Notice that*

$$\text{vert}\,\text{[wh1e}\,\text{e}\,(\varphi)\,\{C\}\text{]}\,(f) \,=\,\text{|f\!p}\,F\_f \,=\,\text{sup}\,\,\{F\_f^n(0)\}\_{n\in\mathbb{N}}.$$

The rules for ert are very similar to the rules for wp. The runtime model we assume is that skip statements, random assignments, and guard evaluations for both conditional choice and while loops cost one unit of time. This runtime model can easily be adopted to count only the number of loop iterations or only the number of random assignments, etc. We conclude with a strong connection between the wp and the ert transformer, that is crucial in our proofs:

**Theorem 2 (Decomposition of ert** [41]**).** *For any* C ∈ pGCL *and* f ∈ E*,*

$$\text{ert}\begin{bmatrix} \left[ C \right] \left( f \right) \end{bmatrix} = \text{ert}\begin{bmatrix} \left[ C \right] \left( 0 \right) + \mathbf{w} \mathbf{p} \begin{bmatrix} \left[ C \right] \left( f \right) \end{bmatrix}.$$

### **4 Expected Runtimes of i.i.d Loops**

We derive a proof rule that allows to determine *exact ERTs of independent and identically distributed loops* (or *i.i.d. loops* for short). Intuitively, a loop

**Fig. 2.** An i.i.d. loop sampling a point within a circle uniformly at random using rejection sampling. The picture on the right–hand side visualizes the procedure: In each iteration a point (×) is sampled. If we obtain a point within the white area inside the square, we terminate. Otherwise, i.e. if we obtain a point within the gray area outside the circle, we resample.

is i.i.d. if the distributions of states that are reached at the end of different loop iterations are equal. This is the case whenever there is no data flow across different iterations. In the non–probabilistic case, such loops either terminate after exactly one iteration or never. This is different for probabilistic programs.

As a running example, consider the program C*circle* in Fig. 2. C*circle* samples a point within a circle with center (5, 5) and radius r = 5 uniformly at random using rejection sampling. In each iteration, it samples a point (x, y) <sup>∈</sup> [0,..., 10]<sup>2</sup> within the square (with some fixed precision). The loop ensures that we resample if a sample is not located within the circle. Our proof rule will allow us to systematically determine the ERT of this loop, i.e. the average amount of time required until a single point within the circle is sampled.

Towards obtaining such a proof rule, we first present a syntactical notion of the i.i.d. property. It relies on expectations that are not affected by a pGCL program:

**Definition 4.** *Let* C ∈ pGCL *and* f ∈ E*. Moreover, let Mod* (C) *denote the set of all variables that occur on the left–hand side of an assignment in* C*, and let Vars*(f) *be the set of all variables that "occur in* f*", i.e. formally*

$$x \in \mathsf{Vars}(f) \qquad \text{iff} \qquad \exists \sigma \, \exists \, v, v' \colon \quad f(\sigma \, [x \mapsto v]) \neq \, f(\sigma \, [x \mapsto v']).$$

*Then* f *is* unaffected *by* C*, denoted* f -C*, iff Vars*(f) ∩ *Mod* (C) = ∅*.*

We are interested in expectations that are unaffected by pGCL programs because of a simple, yet useful observation: If g - C, then g *can be treated like a constant* w.r.t. the transformer wp (i.e. like the a in Theorem 1 (1)). For our running example C*circle* (see Fig. 2), the expectation f = wp -C*body* ([x + y ≤ 10]) is unaffected by the loop body C*body* of C*circle* . Consequently, we have wp -C*body* (f) = f · wp -C*body* (1) = f. In general, we obtain the following property:

**Lemma 1 (Scaling by Unaffected Expectations).** *Let* C ∈ pGCL *and* f,g ∈ E*. Then* g - C *implies* wp -C (g · f) = g · wp -C (f)*.*

*Proof.* By induction on the structure of C. See [3, Appendix A.2].

We develop a proof rule that only requires that both the probability of the guard evaluating to true after one iteration of the loop body (i.e. wp -C ([ϕ])) as well as the expected value of [¬ϕ] · f after one iteration (i.e. wp -C ([¬ϕ] · f)) are unaffected by the loop body. We thus define the following:

**Definition 5 (***f–***Independent and Identically Distributed Loops).** *Let* <sup>C</sup> <sup>∈</sup> pGCL*,* <sup>ϕ</sup> *be a guard, and* <sup>f</sup> <sup>∈</sup> <sup>E</sup>*. Then we call the loop* while (ϕ) {C} f*–*independent and identically distributed *(or* f*–*i.i.d. *for short), if both*

> wp -C ([ϕ]) - C *and* wp -C ([¬ϕ] · f) -C.

*Example 3.* Our example program C*circle* (see Fig. 2) is f–i.i.d. for all f ∈ E. This is due to the fact that

$$\text{wpf}\left[C\_{body}\right]\left(\left[(x-5)^2 + (y-5)^2 \ge 25\right]\right) \\
= \frac{48}{121} \not\equiv C\_{body} \\
\qquad \text{(by Table 1)}$$

and (again for some fixed precision p ∈ N \ {0})

$$\begin{aligned} &\text{wpf } \left[C\_{body}\right] \left(\left[(x-5)^2 + (y-5)^2 > 25\right] \cdot f\right) \\ &= \frac{1}{121} \cdot \sum\_{i=0}^{10p} \sum\_{j=0}^{10p} \left[ (i/p - 5)^2 + (i/p - 5)^2 > 25 \right] \cdot f\left[x/(i/p), y/(i/p)\right] \not\equiv C\_{body}. \quad \triangle \alpha \end{aligned}$$

Our main technical Lemma is that we can express the orbit of the wp– characteristic function as a partial geometric series:

**Lemma 2 (Orbits of** *f–***i.i.d. Loops).** *Let* C ∈ pGCL*,* ϕ *be a guard,* f ∈ E *such that the loop* while (ϕ) {C} *is* <sup>f</sup>*–i.i.d, and let* <sup>F</sup>f *be the corresponding* wp*– characteristic function. Then for all* n ∈ N \ {0}*, it holds that*

$$\left| F\_f^n(0) \right| = \left[ \varphi \right] \cdot \mathsf{w} \mathfrak{p} \left[ C \right] \left( \left[ \neg \varphi \right] \cdot f \right) \cdot \sum\_{i=0}^{n-2} \left( \mathsf{w} \mathfrak{p} \left[ C \right] \left( \left[ \varphi \right] \right)^i \right) \right. \\ \left. + \left[ \neg \varphi \right] \cdot f \cdot f \right]$$

*Proof.* By use of Lemma 1, see [3, Appendix A.3].

Using this precise description of the wp orbits, we now establish proof rules for f–i.i.d. loops, first for wp and later for ert.

**Theorem 3 (Weakest Preexpectations of** *f–***i.i.d. Loops).** *Let* C ∈ pGCL*,* <sup>ϕ</sup> *be a guard, and* <sup>f</sup> <sup>∈</sup> <sup>E</sup>*. If the loop* while (ϕ) {C} *is* <sup>f</sup>*–i.i.d., then*

$$\mathsf{wp}\left[\mathsf{while1e}\left(\varphi\right)\{C\}\right](f) \;=\;\left[\varphi\right]\cdot\frac{\mathsf{wp}\left[C\right]\left(\left[\neg\varphi\right]\cdot f\right)}{1-\mathsf{wp}\left[C\right]\left(\left[\varphi\right]\right)} + \left[\neg\varphi\right]\cdot f\;\;\;\varphi$$

*where we define* <sup>0</sup> <sup>0</sup> := 0*.* *Proof.* We have

$$\begin{aligned} \text{wp } & \|\mathbf{while1e}\left(\varphi\right)\{C\}\| \, \|f\| \\ &= \sup\_{n \in \mathbb{N}} F\_f^n(0) \\ &= \begin{pmatrix} \dots & \dots & \dots & \dots & \dots \end{pmatrix} \begin{pmatrix} & & & & & \text{(by Definition 2)} \\ & & & & & \dots & \dots \\ & & & & & \dots \end{pmatrix} \begin{pmatrix} & & & & & \dots \\ & & & & & \dots \\ & & & & & \dots \end{pmatrix} \end{aligned}$$

$$=\sup\_{n\in\mathbb{N}}\left[\varphi\right]\cdot\mathsf{w}\mathfrak{p}\left[C\right]\left(\left[\neg\varphi\right]\cdot f\right)\cdot\sum\_{i=0}^{n-2}\left(\mathsf{w}\mathfrak{p}\left[C\right]\left(\left[\varphi\right]\right)^{i}\right) + \left[\neg\varphi\right]\cdot f\quad\text{(by Lemma 2)}$$

$$=\left[\varphi\right]\cdot\mathsf{w}\mathfrak{p}\left[C\right]\left(\left[\neg\varphi\right]\cdot f\right)\cdot\sum\_{i=0}^{\omega}\left(\mathsf{w}\mathfrak{p}\left[C\right]\left(\left[\varphi\right]\right)^{i}\right)+\left[\neg\varphi\right]\cdot f.\tag{7}$$

The preexpectation (†) is to be evaluated in some state σ for which we have two cases: The first case is when wp -C ([ϕ]) (σ) < 1. Using the closed form of the geometric series, i.e. <sup>ω</sup> i=0 <sup>q</sup> <sup>=</sup> <sup>1</sup> <sup>1</sup>−q if <sup>|</sup>q<sup>|</sup> <sup>&</sup>lt; 1, we get

$$\left( \left[ \varphi \right] \left( \sigma \right) \cdot \mathsf{w} \mathfrak{p} \left[ C \right] \left( \left[ \neg \varphi \right] \cdot f \right) \left( \sigma \right) \cdot \sum\_{i=0}^{\omega} \left( \mathsf{w} \mathfrak{p} \left[ C \right] \left( \left[ \varphi \right] \right) \left( \sigma \right)^{i} \right) + \left[ \neg \varphi \right] \left( \sigma \right) \cdot f(\sigma)$$
 
$$\left( \dagger \text{ instantiated in } \sigma \right)$$

$$=\left[\varphi\right]\left(\sigma\right)\cdot\frac{\mathsf{wp}\left[C\right]\left(\left[\neg\varphi\right]\cdot f\right)\left(\sigma\right)}{1-\mathsf{wp}\left[C\right]\left(\left[\varphi\right]\right)\left(\sigma\right)}+\left[\neg\varphi\right]\left(\sigma\right)\cdot f(\sigma).$$

(closed form of geometric series)

The second case is when wp -C ([ϕ]) (σ) = 1. This case is technically slightly more involved. The full proof can be found in [3, Appendix A.4].

We now derive a similar proof rule for the ERT of an <sup>f</sup>–i.i.d. loop while (ϕ) {C}.

**Theorem 4 (Proof Rule for ERTs of** *f–***i.i.d. Loops).** *Let* C ∈ pGCL*,* ϕ *be a guard, and* f ∈ E *such that all of the following conditions hold:*

*1.* while (ϕ) {C} *is* <sup>f</sup>*–i.i.d. 2.* wp -C (1) = 1 *(loop body terminates almost–surely). 3.* ert -C (0) -C *(every iteration runs in the same expected time).*

*Then for the ERT of the loop* while (ϕ) {C} *w.r.t. postruntime* <sup>f</sup> *it holds that*

$$\text{vert}\left[\text{wh1e}\left(\varphi\right)\left\{C\right\}\right](f) \;= 1 + \frac{\left[\varphi\right] \cdot \left(1 + \text{ert}\left[C\right]\right)\left(\left[\neg\varphi\right] \cdot f\right)}{1 - \text{wp}\left[C\right]\left(\left[\varphi\right]\right)} + \left[\neg\varphi\right] \cdot f\;\;\bot$$

*where we define* <sup>0</sup> <sup>0</sup> := 0 *and* <sup>a</sup> <sup>0</sup> := ∞*, for* a = 0*.*

*Proof.* We first prove

$$\text{vert}\left[\text{wh1e}\left(\varphi\right)\left\{C\right\}\right](0) = 1 + \left[\varphi\right] \cdot \frac{1 + \text{ert}\left[C\right](0)}{1 - \text{wp}\left[C\right](\left[\varphi\right])}.\tag{\sharp}$$

To this end, we propose the following expression as the orbit of the ert–characteristic function of the loop w.r.t. 0:

$$F\_0^n(0) = 1 + [\varphi] \cdot \left( \text{ert} \left[ C \right](0) \cdot \sum\_{i=0}^n \text{wp} \left[ C \right] \left( \left[ \varphi \right] \right)^i + \sum\_{i=0}^{n-1} \text{wp} \left[ C \right] \left( \left[ \varphi \right] \right)^i \right) \right)$$

For a verification that the above expression is indeed the correct orbit, we refer to the rigorous proof of this theorem in [3, Appendix A.5]. Now, analogously to the reasoning in the proof of Theorem 3 (i.e. using the closed form of the geometric series and case distinction on whether wp -C ([ϕ]) < 1 or wp -C ([ϕ]) = 1), we get that the supremum of this orbit is indeed the right–hand side of (‡). To complete the proof, consider the following:

$$\begin{aligned} &\text{ert}\left[\mathtt{while}\,\mathtt{e}\left(\varphi\right)\left\{C\right\}\right](f) \\ &=\textbf{ert}\,\mathtt{f}\,\mathtt{while}\,\mathtt{e}\left(\varphi\right)\left\{C\right\}\right](0) + \textbf{w}\,\mathtt{p}\,\mathtt{[while}\,\mathtt{e}\left(\varphi\right)\left\{C\right\}\right](f) & \text{(by Theorem 2)} \\ &=\left[1 + \left[\varphi\right]\cdot\frac{1 + \textbf{ert}\,\mathtt{f}\,\mathtt{C}\,\mathtt{f}\,\mathtt{(}0\downarrow\text{)} + \left[\varphi\right]\cdot\frac{\textbf{w}\,\mathtt{p}\,\mathtt{[}C\,\mathtt{f}\,\mathtt{(}\left[\neg\varphi\right]\cdot f\right]}{1 - \textbf{w}\,\mathtt{p}\,\mathtt{[}C\,\mathtt{i}\,\mathtt{(}\left[\varphi\right]\text{)} \\ & & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \$$

$$=1+\left[\varphi\right]\cdot\frac{1+\mathsf{erf}\left\{\mathbb{C}\right\}\left(\left[\neg\varphi\right]\cdot f\right)}{1-\mathsf{w}\mathfrak{p}\left\{\mathbb{C}\right\}\left(\left[\varphi\right]\right)}+\left[\neg\varphi\right]\cdot f\tag{\text{by Theorem 2}}$$

### **5 A Programming Language for Bayesian Networks**

So far we have derived proof rules for formal reasoning about expected outcomes and expected run-times of i.i.d. loops (Theorems 3 and 4). In this section, we apply these results to develop a syntactic pGCL fragment that allows exact computations of closed forms of ERTs. In particular, no invariants, (super)martingales or fixed point computations are required.

After that, we show how BNs with observations can be translated into pGCL programs within this fragment. Consequently, we call our pGCL fragment the *Bayesian Network Language*. As a result of the above translation, we obtain a systematic and automatable approach to compute the *expected sampling time* of a BN in the presence of observations. That is, the expected time it takes to obtain a single sample that satisfies all observations.

### **5.1 The Bayesian Network Language**

Programs in the Bayesian Network Language are organized as sequences of blocks. Every block is associated with a single variable, say x, and satisfies two constraints: First, no variable other than x is modified inside the block, i.e. occurs on the left–hand side of a random assignment. Second, every variable accessed inside of a guard has been initialized before. These restrictions ensure that there is no data flow across multiple executions of the same block. Thus, intuitively, all loops whose body is composed from blocks (as described above) are f–i.i.d. loops.

**Definition 6 (The Bayesian Network Language).** *Let Vars* = {x1, x2, ...} *be a finite set of program variables as in Sect. 3. The set of programs in Bayesian Network Language, denoted* BNL*, is given by the grammar*

$$\begin{array}{ccccc} C & \longrightarrow & Seq & \texttt{prepat}\left\{Seq\right\}\texttt{until1}\left(\psi\right) & \mid & C; \, C \\\\ Seq & \longrightarrow & Seq; \, Seq & \mid & B\_{x\_1} & \mid & B\_{x\_2} & \mid & \dots \\\\ l\_{x\_i} & \longrightarrow & x\_i : \approx \mu & \mid & \texttt{if}\left(\varphi\right)\left\{x\_i : \approx \mu\right\}\texttt{e1sse}\left\{B\_{x\_i}\right\} \\\\ & & & \text{(rule exists for all } x\_i \in \textsf{Vars}). \end{array}$$

*where* <sup>x</sup>i <sup>∈</sup> *Vars is a program variable, all variables in* <sup>ϕ</sup> *have been initialized before, and* <sup>B</sup>x*<sup>i</sup> is a non–terminal parameterized with program variable* <sup>x</sup><sup>i</sup> <sup>∈</sup> *Vars. That is, for all* <sup>x</sup><sup>i</sup> <sup>∈</sup> *Vars there is a non–terminal* <sup>B</sup>x*<sup>i</sup> . Moreover,* <sup>ψ</sup> *is an arbitrary guard and* μ *is a distribution expression of the form* μ = <sup>n</sup> j=1 <sup>p</sup><sup>j</sup> · a<sup>j</sup> *with* <sup>a</sup>j <sup>∈</sup> <sup>Q</sup> *for* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>*.*

*Example 4.* Consider the BNL program C*dice*:

<sup>x</sup><sup>1</sup> :<sup>≈</sup> Unif[1 ... 6]; repeat {x<sup>2</sup> :<sup>≈</sup> Unif[1 ... 6]} until (x<sup>2</sup> <sup>≥</sup> <sup>x</sup>1)

This program first throws a fair die. After that it keeps throwing a second die until its result is at least as large as the first die. 

For any C ∈ BNL, our goal is to compute the exact ERT of C, i.e. ert -C (0). In case of loop–free programs, this amounts to a straightforward application of the ert calculus presented in Sect. 3. To deal with loops, however, we have to perform fixed point computations or require user–supplied artifacts, e.g. invariants, supermartingales, etc. For BNL programs, on the other hand, it suffices to apply the proof rules developed in Sect. 4. As a result, we directly obtain an exact closed form solution for the ERT of a loop. This is a consequence of the fact that all loops in BNL are f–i.i.d., which we establish in the following.

By definition, every loop in BNL is of the form repeat {Bx*<sup>i</sup>* } until (ψ), which is equivalent to <sup>B</sup>x*<sup>i</sup>* ; while (¬ψ) {Bx*<sup>i</sup>* }. Hence, we want to apply Theorem <sup>4</sup> to that while loop. Our first step is to discharge the theorem's premises:

**Lemma 3.** *Let Seq be a sequence of* BNL*–blocks,* g ∈ E*, and* ψ *be a guard. Then:*


*Proof.* 1. is proven by induction on the length of the sequence of blocks *Seq* and 2. is a consequence of 1., see [3, Appendix A.6]. 3. follows immediately from 1. by instantiating g with [¬ψ] and [ψ] · f, respectively.

We are now in a position to derive a closed form for the ERT of loops in BNL.

**Theorem 5.** *For every loop* repeat {*Seq*} until (ψ) <sup>∈</sup> BNL *and every* <sup>f</sup> <sup>∈</sup> <sup>E</sup>*,*

ert repeat {*Seq*} until (ψ) (f) = 1 + ert -*Seq* ([ψ] · f) wp -*Seq* ([ψ]) .

*Proof.* Let <sup>f</sup> <sup>∈</sup> <sup>E</sup>. Moreover, recall that repeat {*Seq*} until (ψ) is equivalent to the program *Seq*; while (¬ψ) {*Seq*} ∈ BNL. Applying the semantics of ert (Table 2), we proceed as follows:

$$\text{ert}\begin{bmatrix} \mathsf{report}\left\{Seq\right\} \mathsf{unti1}\left(\psi\right) \end{bmatrix} \begin{pmatrix} f \\ \end{pmatrix} = \text{ert}\begin{bmatrix} \mathsf{Seq}\left\{ \left( \mathsf{with1a}\left(\neg\psi\right) \left\{Seq \right\} \right\} \left(f\right) \right\} \end{pmatrix}$$

Since the loop body *Seq* is loop–free, it terminates certainly, i.e. wp -*Seq* (1) = 1 (Premise 2. of Theorem 4). Together with Lemma 3.1. and 3., all premises of Theorem 4 are satisfied. Hence, we obtain a closed form for ert while (¬ψ) {*Seq*} (f):

$$=\operatorname{ert}[Seq]\left(\underbrace{1+\frac{\left[^{\neg}\psi\right]\cdot\left(1+\operatorname{ert}\left[Seq\right]\left(\left[\psi\right]\cdot f\right)\right)}{1-\operatorname{wp}\left[Seq\right]\left(\left[\neg\psi\right]\right)}+\left[\psi\right]\cdot f}\_{=:g}\right)^{\flat}$$

By Theorem 2, we know ert -*Seq* (g) = ert -*Seq* (0) +wp -C (g) for any g. Thus:

$$=\operatorname{\bf ext}\left[Seq\right](0)+\mathsf{wp}\left[Seq\right]\left(\underbrace{1+\frac{\left[\neg\psi\right]\cdot\left(1+\operatorname{\bf ext}\left[Seq\right]\left(\left[\psi\right]\cdot f\right)\right)}{1-\mathsf{wp}\left[Seq\right]\left(\left[\neg\psi\right]\right)}^{}+\left[\psi\right]\cdot f}\_{g}\right)+$$

Since wp is linear (Theorem 1 (2)), we obtain:

= ert -*Seq* (0) + wp -*Seq* (1) = 1 + wp -*Seq* ([ψ] · f) + wp -*Seq* [¬ψ] · (1 + ert -*Seq* ([ψ] · f)) 1 − wp -*Seq* ([¬ψ])

By a few simple algebraic transformations, this coincides with:

$$=1+\text{ert}\left[\left[Seq\right](0)+\text{wp}\left[Seq\right](\left[\psi\right]\cdot f)+\text{wp}\left[Seq\right]\left(\left[\neg\psi\right]\cdot \frac{1+\text{ert}\left[Seq\right](\left[\psi\right]\cdot f)}{1-\text{wp}\left[Seq\right](\left[\neg\psi\right])}\right)\right]$$

Let R denote the fraction above. Then Lemma 3.1. and 2. implies R - *Seq*. We may thus apply Lemma 1 to derive wp -*Seq* ([¬ψ] · R) = wp -*Seq* ([¬ψ]) · R. Hence:

$$=1+\text{ert}\left[\text{Seq}\right](0)+\text{wp}\left[\text{Seq}\right](\left[\psi\right]\cdot f)+\text{wp}\left[\text{Seq}\right](\left[\neg\psi\right])\cdot \frac{1+\text{ert}\left[\text{Seq}\right](\left[\psi\right]\cdot f)}{1-\text{wp}\left[\text{Seq}\right](\left[\neg\psi\right])}$$

Again, by Theorem 2, we know that ert -*Seq* (g) = ert -*Seq* (0) +wp -*Seq* (g) for any g. Thus, for g = [ψ] · f, this yields:

$$= 1 + \text{ert} \left[ Seq\right] \left( \left[ \psi \right] \cdot f \right) + \text{wp} \left[ Seq\right] \left( \left[ \neg \psi \right] \right) \cdot \frac{1 + \text{ert} \left[ Seq\right] \left( \left[ \psi \right] \cdot f \right)}{1 - \text{wp} \left[ Seq\right] \left( \left[ \neg \psi \right] \right)}$$

Then a few algebraic transformations lead us to the claimed ERT:

$$=\frac{1+\text{ert}\left[Seq\right](\left[\psi\right]\cdot f)}{\text{wp}\left[Seq\right](\left[\psi\right])}.\tag{7}$$

Note that Theorem 5 holds for arbitrary postexpectations f ∈ E. This enables *compositional reasoning* about ERTs of BNL programs. Since all other rules of the ert–calculus for loop–free programs amount to simple syntactical transformations (see Table 2), we conclude that

**Corollary 1.** *For any* C ∈ BNL*, a closed form for* ert -C (0) *can be computed compositionally.*

*Example 5.* Theorem 5 allows us to comfortably compute the ERT of the BNL program C*dice* introduced in Example 4:

$$x\_1 \asymp \mathtt{Unif}[1 \dots .6]; \mathtt{ repeat}\left\{ x\_2 \asymp \mathtt{Unif}[1 \dots .6] \right\} \mathtt{until}\left(x\_2 \ge x\_1\right)$$

For the ERT, we have

$$\begin{array}{lcl} & \textbf{ert}\left[C\_{\text{dice}}\right](0) \\ = & \textbf{ert}\left[x\_{1}: \approx \textbf{Unif}\left[1\ldots 6\right]\right]\left(\textbf{ert}\left[\textbf{repeat}\left\{\ldots\right\}\right]\textbf{unti}\left[\left(x\_{2} \ge x\_{1}\right)\right]\right)(0)) \text{ (Table 2)} \\ = & \textbf{ert}\left[x\_{1}: \approx \textbf{Unif}\left[1\ldots 6\right]\right]\left(\frac{1 + \textbf{ert}\left[x\_{2}: \approx \textbf{Unif}\left[1\ldots 6\right]\right]\left(\left[x\_{2} \ge x\_{1}\right]\right)}{\textbf{w}\mu\left[x\_{1}: \approx \textbf{Unif}\left[1\ldots 6\right]\right]\left(\left[x\_{2} \ge x\_{1}\right]\right)}\right) \text{ (Thm. 5)} \\ = & \sum\_{1 \le i \le 6} 1^{\left\lfloor \delta\_{i} \cdot \frac{1 + \sum\_{1 \le j \le 6} 1/\delta \cdot \left[j \ge i\right]}{\sum\_{1 \le j \le 6} 1/\delta \cdot \left[j \ge i\right]}\right)} \text{ (Table 2)} \\ = & 3.45. \end{array}$$

$$=3.45.$$

#### **5.2 Bayesian Networks**

To reason about expected sampling times of BNs, it remains to develop a sound translation from BNs with observations into equivalent BNL programs. A BN is a probabilistic graphical model that is given by a directed acyclic graph. Every node is a random variable and a directed edge between two nodes expresses a probabilistic dependency between these nodes.

As a running example, consider the BN depicted in Fig. 3 (inspired by [31]) that models the mood of students after taking an exam. The network contains four random variables. They represent the difficulty of the exam (D), the level of preparation of a student (P), the achieved grade (G), and the resulting mood (M). For simplicity, let us assume that each random variable assumes either 0 or 1. The edges express that the student's mood depends on the achieved grade which, in turn, depends on the difficulty of the exam and the preparation of the student. Every node is accompanied by a table that provides the conditional probabilities of a node *given* the values of all the nodes it depends upon. We can then use the BN to answer queries such as "What is the probability that a

**Fig. 3.** A Bayesian network

student is well–prepared for an exam (P = 1), but ends up with a bad mood (M = 0)?"

In order to translate BNs into equivalent BNL programs, we need a formal representation first. Technically, we consider *extended* BNs in which nodes may additionally depend on inputs that are not represented by nodes in the network. This allows us to define a compositional translation without modifying conditional probability tables.

Towards a formal definition of extended BNs, we use the following notation. A tuple (s1,...,sk) <sup>∈</sup> <sup>S</sup><sup>k</sup> of length <sup>k</sup> over some set <sup>S</sup> is denoted by **<sup>s</sup>**. The empty tuple is ε. Moreover, for 1 ≤ i ≤ k, the i-th element of tuple **s** is given by **s**(i). To simplify the presentation, we assume that all nodes and all inputs are represented by natural numbers.

**Definition 7.** *An* extended Bayesian network*, EBN for short, is a tuple* B = (V,I,E,*Vals*, dep, cpt)*, where*


$$\mathsf{cpt}[v]: \mathsf{Vals}^k \to \mathsf{Vals} \to [0, 1] \quad \text{such that} \quad \sum\_{\mathbf{z} \in \mathsf{Vals}^k, a \in \mathsf{Vals}} \mathsf{cpt}[v](\mathbf{z})(a) \;= \; 1.$$

*Here, the* i*-th entry in a tuple* **z** ∈ *Vals*<sup>k</sup> *corresponds to the value assigned to the* i*-th entry in the sequence of dependencies* dep(v)*.*

*A* Bayesian network *(BN) is an extended BN without inputs, i.e.* I = ∅*. In particular, the dependency function is of the form* dep: V → V <sup>∗</sup>*.*

*Example 6.* The formalization of our example BN (Fig. 3) is straightforward. For instance, the dependencies of variable G are given by dep(G)=(D, P) (assuming D is encoded by an integer less than P). Furthermore, every entry in the conditional probability table of node G corresponds to an evaluation of the function cpt[G]. For example, if D = 1, P = 0, and G = 1, we have cpt[G](1, 0)(1) = 0.4. 

In general, the conditional probability table cpt determines the conditional probability distribution of each node v ∈ V given the nodes and inputs it depends on. Formally, we interpret an entry in a conditional probability table as follows:

$$\Pr\left(v = a \mid \mathsf{dep}(v) = \mathbf{z}\right) \; = \; \mathsf{cpt}[v](\mathbf{z})(a),$$

where v ∈ V is a node, a ∈ Vals is a value, and **z** is a tuple of values of length |dep(v)|. Then, by the chain rule, the joint probability of a BN is given by the product of its conditional probability tables (cf. [4]).

**Definition 8.** *Let* B = (V,I,E,*Vals*, dep, cpt) *be an extended Bayesian network. Moreover, let* <sup>W</sup> <sup>⊆</sup> <sup>V</sup> *be a downward closed*<sup>5</sup> *set of nodes. With each* <sup>w</sup> <sup>∈</sup> <sup>W</sup> <sup>∪</sup>I*, we associate a fixed value* w ∈ *Vals. This notation is lifted pointwise to tuples of nodes and inputs. Then the* joint probability *in which nodes in* W *assume values* W *is given by*

$$\Pr\left(W = \underline{W}\right) \quad = \prod\_{v \in W} \Pr\left(v = \underline{v} \mid \mathsf{dep}(v) = \underline{\mathsf{dep}(v)}\right) \quad = \prod\_{v \in W} \mathsf{cpt}[v](\underline{\mathsf{dep}(v)})(\underline{v}).$$

*The conditional joint probability distribution of a set of nodes* W*, given observations on a set of nodes* O*, is then given by the quotient* Pr(W=W) /Pr(O=O)*.*

For example, the probability of a student having a bad mood, i.e. M = 0, after getting a bad grade (G = 0) for an easy exam (D = 0) given that she was well–prepared, i.e. P = 1, is

$$\begin{aligned} \Pr\left(D=0, G=0, M=0 \mid P=1\right) &= \frac{\Pr\left(D=0, G=0, M=0, P=1\right)}{\Pr\left(P=1\right)} \\ &= \frac{0.9 \cdot 0.5 \cdot 0.6 \cdot 0.3}{0.3} = 0.27. \end{aligned}$$

#### **5.3 From Bayesian Networks to BNL**

We now develop a compositional translation from EBNs into BNL programs. Throughout this section, let B = (V,I,E,Vals, dep, cpt) be a fixed EBN. Moreover, with every node or input <sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>∪</sup> <sup>I</sup> we associate a program variable <sup>x</sup>v.

We proceed in three steps: First, *every node together with its dependencies* is translated into a *block* of a BNL program. These blocks are then composed into a single BNL program that captures the whole BN. Finally, we implement conditioning by means of rejection sampling.

<sup>5</sup> <sup>W</sup> is downward closed if <sup>v</sup> <sup>∈</sup> <sup>W</sup> and (u, v) <sup>∈</sup> <sup>E</sup> implies <sup>u</sup> <sup>∈</sup> <sup>E</sup>.

*Step 1:* We first present the atomic building blocks of our translation. Let v ∈ V be a node. Moreover, let **<sup>z</sup>** <sup>∈</sup> Vals|dep(v)<sup>|</sup> be an evaluation of the dependencies of v. That is, **z** is a tuple that associates a value with every node and input that v depends on (in the same order as dep(v)). For every node v and evaluation of its dependencies **z**, we define a corresponding guard and a random assignment:

$$\begin{aligned} \operatorname{grad}\_{\mathfrak{B}}(v, \mathbf{z}) &= \bigwedge\_{1 \le i \le |\mathsf{dep}(v)| } x\_{\mathsf{dep}(v)(i)} = \mathbf{z}(i) \\ \operatorname{assign}\_{\mathfrak{B}}(v, \mathbf{z}) &= \ x\_v : \approx \sum\_{a \in \mathsf{Vals}} \mathsf{cpt}[v](\mathbf{z})(a) \cdot \langle a \rangle \end{aligned}$$

Note that dep(v)(i) is the i-th element from the sequence of nodes dep(v).

*Example 7.* Continuing our previous example (see Fig. 1), assume we fixed the node v = G. Moreover, let **z** = (1, 0) be an evaluation of dep(v)=(S, R). Then the guard and assignment corresponding to v and **z** are given by:

$$\begin{aligned} \operatorname{grad}\_{\mathfrak{B}}(G,(1,0)) &= \, \_{xD} = 1 \, \wedge \, xP = 0, \,\text{and} \\ \operatorname{essin}\_{\mathfrak{B}}(G,(1,0)) &= \, \_{xG} \approx 0.6 \cdot \langle 0 \rangle + 0.4 \cdot \langle 1 \rangle. \end{aligned} \qquad \Delta$$

We then translate every node v ∈ V into a program block that uses guards to determine the rows in the conditional probability table under consideration. After that, the program samples from the resulting probability distribution using the previously constructed assignments. In case a node does neither depend on other nodes nor input variables we omit the guards. Formally,

$$
\mathsf{Block}\_{\mathsf{B}}(v) = \begin{cases}
 \displaystyle\displaystyle \mathsf{assign}\_{\mathsf{B}}(v, \varepsilon) & \text{if } |\mathsf{dep}(v)| = 0 \\
\displaystyle\inf\left(\mathsf{grad}\_{\mathsf{B}}(v, \mathbf{z\_{1}})\right) \{ \\
\displaystyle\displaystyle\mathsf{assign}\_{\mathsf{B}}(v, \mathbf{z\_{1}}) \} \\
\displaystyle\mathsf{else}\,\mathsf{e1}\,\mathsf{e1}\left\{ \mathsf{if}\,\left(\mathsf{grad}\_{\mathsf{B}}(v, \mathbf{z\_{2}})\right)\right\} \{ \\
\displaystyle\displaystyle\mathsf{sign}\_{\mathsf{B}}(v, \mathbf{z\_{2}}) \} & \text{if } |\mathsf{dep}(v)| = k > 0 \\
\displaystyle\mathsf{assign}\_{\mathsf{B}}(v, \mathbf{z\_{2}}) \{ \\
\ldots\} \,\mathsf{e1}\,\mathsf{s} \,\mathsf{e1}\,\{ \\
\displaystyle\qquad \begin{array}{c} \mathsf{sign}\_{\mathsf{B}}(v, \mathbf{z\_{m}}) \{ \\
\end{array} \,\|\,\mathsf{s}\|\,\mathsf{s}^{k} = \{\mathsf{z\_{1}}, \ldots, \mathsf{z\_{m}}\} \}.
\end{array} \right.
$$

*Remark 1.* The guards under consideration are conjunctions of equalities between variables and literals. We could thus use a more efficient translation of conditional probability tables by adding a switch-case statement to our probabilistic programming language. Such a statement is of the form

$$\{\mathsf{switch}(\mathbf{x}) \mid \{\mathsf{case}\ \mathsf{a}\_1:C\_1 \ \mathsf{case}\ \mathsf{a}\_2:C\_2 \ \dots \ \mathsf{default}:C\_m\},$$

where **<sup>x</sup>** is a tuple of variables, and **<sup>a</sup>**1,... **<sup>a</sup>**m−<sup>1</sup> are tuples of rational numbers of the same length as **<sup>x</sup>**. With respect to the wp semantics, a switch-case statement is syntactic sugar for nested if-then-else blocks as used in the above translation. However, the runtime model of a switch-case statement requires just a single guard evaluation (ϕ) instead of potentially multiple guard evaluations when evaluating nested if-then-else blocks. Since the above adaption is straightforward, we opted to use nested if-then-else blocks to keep our programming language simple and allow, in principle, more general guards.  *Step 2:* The next step is to translate a complete EBN into a BNL program. To this end, we compose the blocks obtained from each node starting at the roots of the network. That is, all nodes that contain no incoming edges. Formally,

$$roots(\mathcal{B}) = \{ v \in V\_{\mathcal{B}} \mid \neg \exists u \in V\_{\mathcal{B}} \colon (u, v) \in E\_{\mathcal{B}} \}.$$

After translating every node in the network, we remove them from the graph, i.e. every root becomes an input, and proceed with the translation until all nodes have been removed. More precisely, given a set of nodes S ⊆ V , the extended BN B \ S obtained by removing S from B is defined as

$$\mathcal{B} \backslash S = \left( V \backslash S, I \cup S, E \backslash (V \times S \cup S \times V), \mathsf{dep}, \mathsf{cpt} \right) \dots$$

With these auxiliary definitions readily available, an extended BN B is translated into a BNL program as follows:

$$BNL(\mathcal{B}) = \begin{cases} block \lg(r\_1); \dots; block \lg(r\_m) & \text{if } roots(\mathcal{B}) = \{r\_1, \dots, r\_m\} = V \\ block \lg(r\_1); \dots; block \lg(r\_m); & \text{if } roots(\mathcal{B}) = \{r\_1, \dots, r\_m\} \xleftarrow{\subseteq} V \\ BNL(\mathcal{B} \mid roots(\mathcal{B})) & \end{cases}$$

*Step 3:* To complete the translation, it remains to account for observations. Let *cond* : V → Vals ∪ {⊥} be a function mapping every node either to an observed value in Vals or to ⊥. The former case is interpreted as an observation that node v has value *cond*(v). Otherwise, i.e. if *cond*(v) = ⊥, the value of node v is *not observed*. We collect all observed nodes in the set O = {v ∈ V | *cond*(v) = ⊥}. It is then natural to incorporate conditioning into our translation by applying rejection sampling: We repeatedly execute a BNL program until every observed node has the desired value *cond*(v). In the presence of observations, we translate the extended BN B into a BNL program as follows:

$$\mathsf{BNL}(\mathcal{B}, \mathit{cond}) \;= \; \mathsf{repeat} \; \{ \mathit{BNL}(\mathcal{B}) \} \; \mathsf{unti1} \left( \bigwedge\_{v \in O} x\_v = \mathit{cond}(v) \right),$$

*Example 8.* Consider, again, the BN B depicted in Fig. 3. Moreover, assume we observe P = 1. Hence, the conditioning function *cond* is given by *cond*(P)=1 and *cond*(v) = ⊥ for v ∈ {D, G, M}. Then the translation of B and *cond*, i.e. *BNL*(B, *cond*), is the BNL program C*mood* depicted in Fig. 4. 

Since our translation yields a BNL program for any given BN, we can compositionally compute a closed form for the expected simulation time of a BN. This is an immediate consequence of Corollary 1.

We still have to prove, however, that our translation is sound, i.e. the conditional joint probabilities inferred from a BN coincide with the (conditional) joint probabilities from the corresponding BNL program. Formally, we obtain the following soundness result.


**Fig. 4.** The BNL program C*mood* obtained from the BN in Fig. 3.

**Theorem 6 (Soundness of Translation).** *Let* B = (V,I,E,*Vals*, dep, cpt) *be a BN and cond* : V → *Vals* ∪ {⊥} *be a function determining the observed nodes. For each node and input* v*, let* v ∈ *Vals be a fixed value associated with* v*. In particular, we set* v = *cond*(v) *for each observed node* v ∈ O*. Then*

$$\operatorname{supp}\left[BNL(\mathcal{B}, cond)\right]\left(\left[\bigwedge\_{v\in V\backslash O}x\_{v}=\underline{v}\right]\right) \\ = \frac{\operatorname{Pr}\left(\bigwedge\_{v\in V}v=\underline{v}\right)}{\operatorname{Pr}\left(\bigwedge\_{o\in O}o=\underline{o}\right)}.$$

*Proof.* Without conditioning, i.e. O = ∅, the proof proceeds by induction on the number of nodes of B. With conditioning, we additionally apply Theorems 3 and 5 to deal with loops introduced by observed nodes. See [3, Appendix A.7].

*Example 9 (Expected Sampling Time of a BN).* Consider, again, the BN B in Fig. 3. Moreover, recall the corresponding program C*mood* derived from B in Fig. 4, where we observed P = 1. By Theorem 6 we can also determine the probability that a student who got a bad grade in an easy exam was well– prepared by means of weakest precondition reasoning. This yields

$$\begin{aligned} & \text{wpf } [C\_{mod}] \text{ (} [x\_D = 0 \land x\_G = 0 \land x\_M = 0] \text{)}\\ & \quad = \frac{\Pr \left( D = 0, G = 0, M = 0, P = 1 \right)}{\Pr \left( P = 1 \right)} = \; 0.27. \end{aligned}$$

Furthermore, by Corollary 1, it is straightforward to determine the expected time to obtain a single sample of B that satisfies the observation P = 1:

$$\text{ert}\begin{bmatrix} \left[C\_{mod}\right](0) \end{bmatrix} = \frac{1 + \text{ert}\begin{bmatrix} \left[C\_{loop\text{-}body}\right](0) \end{bmatrix}}{\text{wp}\begin{bmatrix} C\_{loop\text{-}body} \end{bmatrix}\left(\left[P=1\right]\right)} = 23.4 + \text{'}/\text{5} = 23.4\bar{6}. \quad \triangle$$

### **6 Implementation**

We implemented a prototype in Java to analyze expected sampling times of Bayesian networks. More concretely, our tool takes as input a BN together with observations in the popular Bayesian Network Interchange Format.<sup>6</sup> The BN is then translated into a BNL program as shown in Sect. 5. Our tool applies the ert–calculus together with our proof rules developed in Sect. 4 to compute the exact expected runtime of the BNL program.

The size of the resulting BNL program is linear in the total number of rows of all conditional probability tables in the BN. The program size is thus *not* the bottleneck of our analysis. As we are dealing with an NP–hard problem [12,13], it is not surprising that our algorithm has a worst–case exponential time complexity. However, also the space complexity of our algorithm is exponential in the worst case: As an expectation is propagated backwards through an if–clause of the BNL program, the size of the expectation is potentially multiplied. This is also the reason that our analysis runs out of memory on some benchmarks.

We evaluated our implementation on the *largest* BNs in the Bayesian Network Repository [46] that consists—to a large extent—of real–world BNs including expert systems for, e.g., electromyography (munin) [2], hematopathology diagnosis (hepar2) [42], weather forecasting (hailfinder) [1], and printer troubleshooting in Windows 95 (win95pts) [45, Sect. 5.6.2]. For a evaluation of *all* BNs in the repository, we refer to the extended version of this paper [3, Sect. 6].

All experiments were performed on an HP BL685C G7. Although up to 48 cores with 2.0 GHz were available, only one core was used apart from Java's garbage collection. The Java virtual machine was limited to 8 GB of RAM.

Our experimental results are shown in Table 3. The number of nodes of the considered BNs ranges from 56 to 1041. For each Bayesian network, we computed the expected sampling time (EST) for different collections of observed nodes (#obs). Furthermore, Table 3 provides the *average Markov Blanket size*, i.e. the average number of parents, children and children's parents of nodes in the BN [43], as an indicator measuring how independent nodes in the BN are.

Observations were picked at random. Note that the time required by our prototype varies depending on both the number of observed nodes and the actual observations. Thus, there are cases in which we run out of memory although the total number of observations is small.

In order to obtain an understanding of what the EST corresponds to in actual execution times on a real machine, we also performed simulations for the win95pts network. More precisely, we generated Java programs from this network analogously to the translation in Sect. 5. This allowed us to approximate that our Java setup can execute 9.<sup>714</sup> · <sup>10</sup><sup>6</sup> steps (in terms of EST) per second.

For the win95pts with 17 observations, an EST of 1.11·10<sup>15</sup> then corresponds to an expected time of approximately 3.6 *years* in order to obtain a *single* valid sample. We were additionally able to find a case with 13 observed nodes where our tool discovered within 0.32 s an EST that corresponds to approximately 4.3 *million years*. In contrast, exact inference using variable elimination was almost instantaneous. This demonstrates that knowing expected sampling times upfront can indeed be beneficial when selecting an inference method.

<sup>6</sup> http://www.cs.cmu.edu/∼fgcozman/Research/InterchangeFormat/.


**Table 3.** Experimental results. Time is in seconds. MO denotes out of memory.

### **7 Conclusion**

We presented a syntactic notion of independent and identically distributed probabilistic loops and derived dedicated proof rules to determine exact expected outcomes and runtimes of such loops. These rules do not require any user–supplied information, such as invariants, (super)martingales, etc.

Moreover, we isolated a syntactic fragment of probabilistic programs that allows to compute expected runtimes in a highly automatable fashion. This fragment is non–trivial: We show that all Bayesian networks can be translated into programs within this fragment. Hence, we obtain an automated formal method for computing expected simulation times of Bayesian networks. We implemented this method and successfully applied it to various real–world BNs that stem from, amongst others, medical applications. Remarkably, our tool was capable of proving extremely large expected sampling times within seconds.

There are several directions for future work: For example, there exist subclasses of BNs for which exact inference is in P, e.g. polytrees. Are there analogies for probabilistic programs? Moreover, it would be interesting to consider more complex graphical models, such as recursive BNs [16].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Relational Reasoning for Markov Chains in a Probabilistic Guarded Lambda Calculus**

Alejandro Aguirre1(B), Gilles Barthe<sup>1</sup>, Lars Birkedal<sup>2</sup>, Aleˇs Bizjak<sup>2</sup>, Marco Gaboardi<sup>3</sup>, and Deepak Garg<sup>4</sup>

> IMDEA Software Institute, Madrid, Spain alejandro.aguirre@imdea.org Aarhus University, Aarhus, Denmark University at Buffalo, SUNY, Buffalo, USA MPI-SWS, Kaiserslautern and Saarbr¨ucken, Germany

**Abstract.** We extend the simply-typed guarded λ-calculus with discrete probabilities and endow it with a program logic for reasoning about relational properties of guarded probabilistic computations. This provides a framework for programming and reasoning about infinite stochastic processes like Markov chains. We demonstrate the logic sound by interpreting its judgements in the topos of trees and by using probabilistic couplings for the semantics of relational assertions over distributions on discrete types.

The program logic is designed to support syntax-directed proofs in the style of relational refinement types, but retains the expressiveness of higher-order logic extended with discrete distributions, and the ability to reason relationally about expressions that have different types or syntactic structure. In addition, our proof system leverages a well-known theorem from the coupling literature to justify better proof rules for relational reasoning about probabilistic expressions. We illustrate these benefits with a broad range of examples that were beyond the scope of previous systems, including shift couplings and lump couplings between random walks.

### **1 Introduction**

Stochastic processes are often used in mathematics, physics, biology or finance to model evolution of systems with uncertainty. In particular, Markov chains are "memoryless" stochastic processes, in the sense that the evolution of the system depends only on the current state and not on its history. Perhaps the most emblematic example of a (discrete time) Markov chain is the simple random walk over the integers, that starts at 0, and that on each step moves one position either left or right with uniform probability. Let p<sup>i</sup> be the position at time i. Then, this Markov chain can be described as:

$$p\_0 = 0 \qquad p\_{i+1} = \begin{cases} p\_i + 1 \text{ with probability } 1/2\\ p\_i - 1 \text{ with probability } 1/2 \end{cases}$$

c The Author(s) 2018 A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 214–241, 2018. https://doi.org/10.1007/978-3-319-89884-1\_8

The goal of this paper is to develop a programming and reasoning framework for probabilistic computations over infinite objects, such as Markov chains. Although programming and reasoning frameworks for infinite objects and probabilistic computations are well-understood in isolation, their combination is challenging. In particular, one must develop a proof system that is powerful enough for proving interesting properties of probabilistic computations over infinite objects, and practical enough to support effective verification of these properties.

*Modelling Probabilistic Infinite Objects.* A first challenge is to model probabilistic infinite objects. We focus on the case of Markov chains, due to its importance. A (discrete-time) Markov chain is a sequence of random variables {Xi} over some fixed type T satisfying some independence property. Thus, the straightforward way of modelling a Markov chain is as a *stream of distributions* over T. Going back to the simple example outlined above, it is natural to think about this kind of *discrete-time* Markov chain as characterized by the sequence of positions {pi}<sup>i</sup>∈<sup>N</sup>, which in turn can be described as an infinite set indexed by the natural numbers. This suggests that a natural way to model such a Markov chain is to use *streams* in which each element is produced *probabilistically* from the previous one. However, there are some downsides to this representation. First of all, it requires explicit reasoning about probabilistic dependency, since Xi+1 depends on Xi. Also, we might be interested in global properties of the executions of the Markov chain, such as "The probability of passing through the initial state infinitely many times is 1". These properties are naturally expressed as properties of the whole stream. For these reasons, we want to represent Markov chains as *distributions over streams*. Seemingly, one downside of this representation is that the set of streams is not countable, which suggests the need for introducing heavy measure-theoretic machinery in the semantics of the programming language, even when the underlying type is discrete or finite.

Fortunately, measure-theoretic machinery can be avoided (for discrete distributions) by developing a probabilistic extension of the simply-typed guarded λ-calculus and giving a semantic interpretation in the topos of trees [1]. Informally, the simply-typed guarded λ-calculus [1] extends the simply-typed lambda calculus with a *later* modality, denoted by . The type A ascribes expressions that are available one unit of logical time in the future. The modality allows one to model infinite types by using "finite" approximations. For example, a stream of natural numbers is represented by the sequence of its (increasing) prefixes in the topos of trees. The prefix containing the first i elements has the type S<sup>i</sup> - <sup>N</sup> <sup>×</sup> <sup>N</sup> <sup>×</sup> ... <sup>×</sup> (i−1)N, representing that the first element is available now, the second element a unit time in the future, and so on. This is the key to representing probability distributions over infinite objects without measure-theoretic semantics: We model probability distributions over non-discrete sets as discrete distributions over their (the sets') approximations. For example, a distribution over streams of natural numbers (which a priori would be non-discrete since the set of streams is uncountable) would be modelled by a *sequence of distributions* over the finite approximations S1, S2,... of streams. Importantly, since each S<sup>i</sup> is countable, each of these distributions can be discrete.

*Reasoning About Probabilistic Computations.* Probabilistic computations exhibit a rich set of properties. One natural class of properties is related to probabilities of events, saying, for instance, that the probability of some event E (or of an indexed family of events) increases at every iteration. However, several interesting properties of probabilistic computation, such as stochastic dominance or convergence (defined below) are relational, in the sense that they refer to two runs of two processes. In principle, both classes of properties can be proved using a higher-order logic for probabilistic expressions, e.g. the internal logic of the topos of trees, suitably extended with an axiomatization of finite distributions. However, we contend that an alternative approach inspired from refinement types is desirable and provides better support for effective verification. More specifically, reasoning in a higher-order logic, e.g. in the internal logic of the topos of trees, does not exploit the *structure of programs* for non-relational reasoning, nor the *structural similarities* between programs for relational reasoning. As a consequence, reasoning is more involved. To address this issue, we define a relational proof system that exploits the structure of the expressions and supports syntax-directed proofs, with necessary provisions for escaping the syntax-directed discipline when the expressions do not have the same structure. The proof system manipulates judgements of the form:

$$\Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash t\_1 : A\_1 \sim t\_2 : A\_2 \mid \phi$$

where Δ and Γ are two typing contexts, Σ and Ψ respectively denote sets of assertions over variables in these two contexts, t<sup>1</sup> and t<sup>2</sup> are well-typed expressions of type A<sup>1</sup> and A2, and φ is an assertion that may contain the special variables **r**<sup>1</sup> and **r**<sup>2</sup> that respectively correspond to the values of t<sup>1</sup> and t2. The context Δ and Γ, the terms t<sup>1</sup> and t<sup>2</sup> and the types A<sup>1</sup> and A<sup>2</sup> provide a specification, while Σ, Ψ, and φ are useful for reasoning about relational properties over t1, t2, their inputs and their outputs. This form of judgement is similar to that of Relational Higher-Order Logic [2], from which our system draws inspiration.

In more detail, our relational logic comes with typing rules that allow one to reason about relational properties by exploiting as much as possible the syntactic similarities between t<sup>1</sup> and t2, and to fall back on pure logical reasoning when these are not available. In order to apply relational reasoning to guarded computations the logic provides relational rules for the later modality and for a related modality , called "constant". These rules allow the relational verification of general relational properties that go beyond the traditional notion of program equivalence and, moreover, they allow the verification of properties of guarded computations over different types. The ability to reason about computations of different types provides significant benefits over alternative formalisms for relational reasoning. For example, it enables reasoning about relations between programs working on different data structures, e.g. a relation between a program working on a stream of natural numbers, and a program working on a stream of pairs of natural numbers, or having different structures, e.g. a relation between an application and a case expression.

Importantly, our approach for reasoning formally about probabilistic computations is based on *probabilistic couplings*, a standard tool from the analysis of Markov chains [3,4]. From a verification perspective, probabilistic couplings go beyond equivalence properties of probabilistic programs, which have been studied extensively in the verification literature, and yet support compositional reasoning [5,6]. The main attractive feature of coupling-based reasoning is that it limits the need of explicitly reasoning about the probabilities—this avoids complex verification conditions. We provide sound proof rules for reasoning about probabilistic couplings. Our rules make several improvements over prior relational verification logics based on couplings. First, we support reasoning over probabilistic processes of different types. Second, we use Strassen's theorem [7] a remarkable result about probabilistic couplings, to achieve greater expressivity. Previous systems required to prove a bijection between the sampling spaces to show the existence of a coupling [5,6], Strassen's theorem gives a way to show their existence which is applicable in settings where the bijection-based approach cannot be applied. And third, we support reasoning with what are called shift couplings, coupling which permits to relate the states of two Markov chains at possibly different times (more explanations below).

*Case Studies.* We show the flexibility of our formalism by verifying several examples of relational properties of probabilistic computations, and Markov chains in particular. These examples cannot be verified with existing approaches.

First, we verify a classic example of probabilistic non-interference which requires the reasoning about computations at different types. Second, in the context of Markov chains, we verify an example about stochastic dominance which exercises our more general rule for proving the existence of couplings modelled by expressions of different types. Finally, we verify an example involving shift relations in an infinite computation. This style of reasoning is motivated by "shift" couplings in Markov chains. In contrast to a standard coupling, which relates the states of two Markov chains at the same time t, a shift coupling relates the states of two Markov chains at possibly different times. Our specific example relates a standard random walk (described earlier) to a variant called a lazy random walk; the verification requires relating the state of standard random walk at time t to the state of the lazy random walk at time 2t. We note that this kind of reasoning is impossible with conventional relational proof rules even in a non-probabilistic setting. Therefore, we provide a novel family of proof rules for reasoning about shift relations. At a high level, the rules combine a careful treatment of the later and constant modalities with a refined treatment of fixpoint operators, allowing us to relate different iterates of function bodies.

#### **Summary of Contributions**

With the aim of providing a general framework for programming and reasoning about Markov chains, the three main contributions of this work are:


about programs that have different types and structures. Additionally, this logic uses results from the coupling literature to achieve greater expressivity than previous systems.

3. An extension of the relational logic that allows to relate the states of two streams at possibly different times. This extension supports reasoning principles, such as shift couplings, that escape conventional relational logics.

Omitted technical details can be found in the full version of the paper with appendix at https://arxiv.org/abs/1802.09787.

### **2 Mathematical Preliminaries**

This section reviews the definition of discrete probability sub-distributions and introduces mathematical couplings.

**Definition 1 (Discrete probability distribution).** *Let* C *be a discrete (i.e., finite or countable) set. A (total) distribution over* C *is a function* μ : C → [0, 1] *such that* <sup>x</sup>∈<sup>C</sup> <sup>μ</sup>(x)=1*. The support of a distribution* <sup>μ</sup> *is the set of points with non-zero probability,* supp μ - {x ∈ C | μ(x) > 0}*. We denote the set of distributions over* C *as* D(C)*. Given a subset* E ⊆ C*, the probability of sampling from* μ *a point in* E *is denoted* Pr<sup>x</sup>←<sup>μ</sup>[x ∈ E]*, and is equal to* <sup>x</sup>∈<sup>E</sup> <sup>μ</sup>(x)*.*

**Definition 2 (Marginals).** *Let* μ *be a distribution over a product space* C<sup>1</sup> × C2*. The first (second) marginal of* μ *is another distribution* D(π1)(μ) (D(π2)(μ)) *over* C<sup>1</sup> (C2) *defined as:*

$$\mathsf{D}(\pi\_1)(\mu)(x) = \sum\_{y \in C\_2} \mu(x, y) \qquad \left(\mathsf{D}(\pi\_2)(\mu)(y) = \sum\_{x \in C\_1} \mu(x, y)\right)$$

**Probabilistic Couplings.** Probabilistic couplings are a fundamental tool in the analysis of Markov chains. When analyzing a relation between two probability distributions it is sometimes useful to consider instead a distribution over the product space that somehow "couples" the randomness in a convenient manner.

Consider for instance the case of the following Markov chain, which counts the total amount of tails observed when tossing repeatedly a biased coin with probability of tails p:

$$n\_0 = 0 \qquad n\_{i+1} = \begin{cases} n\_i + 1 \text{ with probability } p \\ n\_i \text{ with probability } (1 - p) \end{cases}$$

If we have two biased coins with probabilities of tails p and q with p ≤ q and we respectively observe {ni} and {mi} we would expect that, in some sense, n<sup>i</sup> ≤ m<sup>i</sup> should hold for all i (this property is known as stochastic dominance). A formal proof of this fact using elementary tools from probability theory would require to compute the cumulative distribution functions for n<sup>i</sup> and m<sup>i</sup> and then to compare them. The coupling method reduces this proof to showing a way to pair the coin flips so that if the first coin shows tails, so does the second coin.

We now review the definition of couplings and state relevant properties.

**Definition 3 (Couplings).** *Let* μ<sup>1</sup> ∈ D(C1) *and* μ<sup>2</sup> ∈ D(C2)*, and* R ⊆ C1×C2*.*


*Moreover, we write* <sup>μ</sup>1,μ<sup>2</sup> .R *iff there exists a* R*-coupling for* μ<sup>1</sup> *and* μ2*.*

Couplings always exist. For instance, the product distribution of two distributions is always a coupling. Going back to the example about the two coins, it can be proven by computation that the following is a coupling that lifts the less-or-equal relation (0 indicating heads and 1 indicating tails):

$$\begin{cases} \text{ (0,0)} \le / \text{ prob } (1-q) & \text{(0,1)} \le / \text{ prob } (q-p) \\\ \text{(1,0)} \le / \text{ prob } 0 & \text{(1,1)} \le / \text{ prob } p \end{cases}$$

The following theorem in [7] gives a necessary and sufficient condition for the existence of R-couplings between two distributions. The theorem is remarkable in the sense that it proves an equivalence between an existential property (namely the existence of a particular coupling) and a universal property (checking, for each event, an inequality between probabilities).

**Theorem 1 (Strassen's theorem).** *Consider* μ<sup>1</sup> ∈ D(C1) *and* μ<sup>2</sup> ∈ D(C2)*, and* R ⊆ C<sup>1</sup> × C2*. Then* <sup>μ</sup>1,μ<sup>2</sup> .R *iff for every* X ⊆ C1*,* Pr<sup>x</sup>1←μ<sup>1</sup> [x<sup>1</sup> ∈ X] ≤ Pr<sup>x</sup>2←μ<sup>2</sup> [x<sup>2</sup> ∈ R(X)]*, where* R(X) *is the image of* X *under* R*, i.e.* R(X) = {y ∈ C<sup>2</sup> | ∃x ∈ X. R x y}*.*

An important property of couplings is closure under sequential composition.

**Lemma 1 (Sequential composition couplings).** *Let* μ<sup>1</sup> ∈ D(C1)*,* μ<sup>2</sup> ∈ D(C2)*,* M<sup>1</sup> : C<sup>1</sup> → D(D1) *and* M<sup>2</sup> : C<sup>2</sup> → D(D2)*. Moreover, let* R ⊆ C<sup>1</sup> × C<sup>2</sup> *and* S ⊆ D<sup>1</sup> × D2*. Assume:* (1) <sup>μ</sup>1,μ<sup>2</sup> .R*; and* (2) *for every* x<sup>1</sup> ∈ C<sup>1</sup> *and* x<sup>2</sup> ∈ C<sup>2</sup> *such that* R x<sup>1</sup> x2*, we have* <sup>M</sup>1(x1),M2(x2).S*. Then* (bind <sup>μ</sup><sup>1</sup> <sup>M</sup>1),(bind <sup>μ</sup><sup>2</sup> <sup>M</sup>2).S*, where* bind μ M *is defined as*

$$(\text{bind } \mu \ M)(y) = \sum\_{x} \mu(x) \cdot M(x)(y)$$

We conclude this section with the following lemma, which follows from Strassen's theorem:

**Lemma 2 (Fundamental lemma of couplings).** *Let* R ⊆ C1×C2*,* E<sup>1</sup> ⊆ C<sup>1</sup> *and* E<sup>2</sup> ⊆ C<sup>2</sup> *such that for every* x<sup>1</sup> ∈ E<sup>1</sup> *and* x<sup>2</sup> ∈ C2*,* R x<sup>1</sup> x<sup>2</sup> *implies* x<sup>2</sup> ∈ E2*, i.e.* R(E1) ⊆ E2*. Moreover, let* μ<sup>1</sup> ∈ D(C1) *and* μ<sup>2</sup> ∈ D(C2) *such that* <sup>μ</sup>1,μ<sup>2</sup> .R*. Then*

$$\Pr\_{x\_1 \leftarrow \mu\_1}[x\_1 \in E\_1] \le \Pr\_{x\_2 \leftarrow \mu\_2}[x\_2 \in E\_2]$$

This lemma can be used to prove probabilistic inequalities from the existence of suitable couplings:

### **Corollary 1.** *Let* μ1, μ<sup>2</sup> ∈ D(C)*:*

*1. If* μ1,μ<sup>2</sup> .(=)*, then for all* x ∈ C*,* μ1(x) = μ2(x)*. 2. If* <sup>C</sup> <sup>=</sup> <sup>N</sup> *and* μ1,μ<sup>2</sup> .(≥)*, then for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* Prx←μ<sup>1</sup> [<sup>x</sup> <sup>≥</sup> <sup>n</sup>] <sup>≥</sup> Prx←μ<sup>2</sup> [<sup>x</sup> <sup>≥</sup> <sup>n</sup>]

In the example at the beginning of the section, the property we want to prove is precisely that, for every k and i, the following holds:

$$\Pr\_{x\_1 \leftarrow n\_i}[x\_1 \ge k] \le \Pr\_{x\_2 \leftarrow m\_i}[x\_2 \ge k]$$

Since we have a ≤-coupling, this proof is immediate. This example is formalized in Subsect. 3.3.

### **3 Overview of the System**

In this section we give a high-level overview of our system, with the details on Sects. 4, 5 and 6. We start by presenting the base logic, and then we show how to extend it with probabilities and how to build a relational reasoning system on top of it.

### **3.1 Base Logic: Guarded Higher-Order Logic**

Our starting point is the Guarded Higher-Order Logic [1] (Guarded HOL) inspired by the topos of trees. In addition to the usual constructs of HOL to reason about lambda terms, this logic features the and modalities to reason about infinite terms, in particular streams. The modality is used to reason about objects that will be available in the future, such as tails of streams. For instance, suppose we want to define an All(s, φ) predicate, expressing that all elements of a stream s ≡ n::xs satisfy a property φ. This can be axiomatized as follows:

$$(\forall xs: \rhd \text{Str}\_{\mathbb{N}})(n:\mathbb{N}).\phi\text{ }n \Rightarrow \rhd \left[s \leftarrow xs\right].\text{All}(s,x.\phi) \Rightarrow \text{All}(n::xs,x.\phi)$$

We use x.φ to denote that the formula φ depends on a free variable x, which will get replaced by the first argument of All. We have two antecedents. The first one states that the head n satisfies φ. The second one, [s ← xs] . All(s, x.φ), states that all elements of xs satisfy φ. Formally, xs is the tail of the stream and will be available in the future, so it has type StrN. The *delayed substitution* [s ← xs] replaces s of type Str<sup>N</sup> with xs of type Str<sup>N</sup> inside All and shifts the whole formula one step into the future. In other words, [s ← xs] . All(s, x.φ) states that All(−, x.φ) will be satisfied by xs in the future, once it is available.

#### **3.2 A System for Relational Reasoning**

When proving relational properties it is often convenient to build proofs guided by the syntactic structure of the two expressions to be related. This style of reasoning is particularly appealing when the two expressions have the same structure and control-flow, and is appealingly close to the traditional style of reasoning supported by refinement types. At the same time, a strict adherence to the syntax-directed discipline is detrimental to the expressiveness of the system; for instance, it makes it difficult or even impossible to reason about structurally dissimilar terms. To achieve the best of both worlds, we present a relational proof system built on top of Guarded HOL, which we call Guarded RHOL. Judgements have the shape:

$$\Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash t\_1 : A\_1 \sim t\_2 : A\_2 \mid \phi$$

where φ is a logical formula that may contain two distinguished variables **r**<sup>1</sup> and **r**<sup>2</sup> that respectively represent the expressions t<sup>1</sup> and t2. This judgement subsumes two typing judgements on t<sup>1</sup> and t<sup>2</sup> and a relation φ on these two expressions. However, this form of judgement does not tie the logical property to the type of the expressions, and is key to achieving flexibility while supporting syntax-directed proofs whenever needed. The proof system combines rules of two different flavours: two-sided rules, which relate expressions with the same toplevel constructs, and one-sided rules, which operate on a single expression.

We then extend Guarded HOL with a modality that lifts assertions over discrete types C<sup>1</sup> and C<sup>2</sup> to assertions over D(C1) and D(C2). Concretely, we define for every assertion φ, variables x<sup>1</sup> and x<sup>2</sup> of type C<sup>1</sup> and C<sup>2</sup> respectively, and expressions t<sup>1</sup> and t<sup>2</sup> of type D(C1) and D(C2) respectively, the modal assertion [x1←t1,x2←t2]φ which holds iff the interpretations of t<sup>1</sup> and t<sup>2</sup> are related by the probabilistic lifting of the interpretation of φ. We call this new logic Probabilistic Guarded HOL.

We accordingly extend the relational proof system to support reasoning about probabilistic expressions by adding judgements of the form:

$$\Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash t\_1 : \mathbb{D}(C\_1) \sim t\_2 : \mathbb{D}(C\_2) \mid \diamond\_{[x\_1 \leftarrow \mathbf{r}\_1, x\_2 \leftarrow \mathbf{r}\_2]} \phi \vdash$$

expressing that t<sup>1</sup> and t<sup>2</sup> are distributions related by a φ-coupling. We call this proof system Probabilistic Guarded RHOL. These judgements can be built by using the following rule, that lifts relational judgements over discrete types C<sup>1</sup> and C<sup>2</sup> to judgements over distribution types D(C1) and D(C2) when the premises of Strassen's theorem are satisfied.

$$\frac{\Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash \forall X\_1 \subseteq C\_1. \text{Pr}\_{y\_1 \leftarrow t\_1}[y\_1 \in X\_1] \le \text{Pr}\_{y\_2 \leftarrow t\_2}[\exists y\_1 \in X\_1. \phi]}{\Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash t\_1 : \mathsf{D}(C\_1) \sim t\_2 : \mathsf{D}(C\_2) \mid \diamond\_{[y\_1 \leftarrow \mathsf{r}\_1, y\_2 \leftarrow \mathsf{r}\_2]} \phi} \text{ } \mathsf{COU} \mathsf{PLING} \,\,\,$$

Recall that (discrete time) Markov chains are "memoryless" probabilistic processes, whose specification is given by a (discrete) set C of states, an initial state s<sup>0</sup> and a probabilistic transition function step : C → D(C), where D(S) represents the set of discrete distributions over C. As explained in the introduction, a convenient modelling of Markov chains is by means of probabilistic -

streams, i.e. to model a Markov chain as an element of D(StrS), where S is its underlying state space. To model Markov chains, we introduce a markov operator with type C → (C → D(C)) → D(Str<sup>C</sup> ) that, given an initial state and a transition function, returns a Markov chain. We can reason about Markov chains by the [Markov] rule (the context, omitted, does not change):

$$\begin{array}{c} \vdash t\_1 : C\_1 \sim t\_2 : C\_2 \mid \phi\\ \vdash h\_1 : C\_1 \to \mathsf{D}(C\_1) \sim h\_2 : C\_2 \to \mathsf{D}(C\_2) \mid \psi\_3\\ \vdash \psi\_4 \end{array}$$

$$\begin{array}{c} \vdash \text{markov}(t\_1, h\_1) : \mathsf{D}(\text{Str}\_{D\_1}) \sim \text{markov}(t\_2, h\_2) : \mathsf{D}(\text{Str}\_{D\_2}) \mid \diamond\_{\begin{subarray}{c}[\eta\_1 \leftarrow \mathsf{r}\_1] \\ \forall 2^{\mathsf{crs}} \leftarrow \mathsf{r}\_2 \end{subarray}} \phi^{\mathsf{Mrakov}} \end{array} \text{Markov}$$

$$\text{where}\begin{cases} \psi\_3 \equiv \forall x\_1 x\_2.\phi[x\_1/\mathbf{r}\_1][x\_2/\mathbf{r}\_2] \Rightarrow \phi\_{[y\_1 \leftarrow \mathbf{r}\_1 \; \; \; x\_1, y\_2 \leftarrow \mathbf{r}\_2 \; \; x\_2]} \phi[y\_1/\mathbf{r}\_1][y\_2/\mathbf{r}\_2] \\ \psi\_4 \equiv \forall x\_1 \; x\_2 \; xs\_1 \; xs\_2.\phi[x\_1/\mathbf{r}\_1][x\_2/\mathbf{r}\_2] \Rightarrow \models [y\_1 \leftarrow xs\_1, y\_2 \leftarrow xs\_2] \; \phi' \Rightarrow \\ \phi'[x\_1 :: xs\_1/y\_1][x\_2 :: xs\_2/y\_2] \end{cases}$$

Informally, the rule stipulates the existence of an invariant φ over states. The first premise insists that the invariant hold on the initial states, the condition ψ<sup>3</sup> states that the transition functions preserve the invariant, and ψ<sup>4</sup> states that the invariant φ over pairs of states can be lifted to a stream property φ .

Other rules of the logic are given in Fig. 1. The language construct munit creates a point distribution whose entire mass is at its argument. Accordingly, the [UNIT] rule creates a straightforward coupling. The [MLET] rule internalizes sequential composition of couplings (Lemma 1) into the proof system. The construct let x = t in t composes a distribution t with a probabilistic computation t with one free variable x by sampling x from t and running t . The [MLET-L] rule supports one-sided reasoning about let x = t in t and relies on the fact that couplings are closed under convex combinations. Note that one premise of the rule uses a unary judgement, with a non-relational modality [x←**r**]φ whose informal meaning is that φ holds with probability 1 in the distribution **r**.

The following table summarizes the different base logics we consider, the relational systems we build on top of them, including the ones presented in [2], and the equivalences between both sides:


### **3.3 Examples**

We formalize elementary examples from the literature on security and Markov chains. None of these examples can be verified in prior systems. Uniformity of

**Fig. 1.** Proof rules for probabilistic constructs

*one-time pad* and lumping of *random walks* cannot even be stated in prior systems because the two related expressions in these examples have different types. The *random walk vs lazy random walk* (shift coupling) cannot be proved in prior systems because it requires either asynchronous reasoning or code rewriting. Finally, the *biased coin example* (stochastic dominance) cannot be proved in prior work because it requires Strassen's formulation of the existence of coupling (rather than a bijection-based formulation) or code rewriting. We give additional details below.

**One-Time Pad/Probabilistic Non-interference.** Non-interference [8] is a baseline information flow policy that is often used to model confidentiality of computations. In its simplest form, non-interference distinguishes between public (or low) and private (or high) variables and expressions, and requires that the result of a public expression not depend on the value of its private parameters. This definition naturally extends to probabilistic expressions, except that in this case the evaluation of an expression yields a distribution rather than a value. There are deep connections between probabilistic non-interference and several notions of (information-theoretic) security from cryptography. In this paragraph, we illustrate different flavours of security properties for one-time pad encryption. Similar reasoning can be carried out for proving (passive) security of secure multiparty computation algorithms in the 3-party or multi-party setting [9,10].

One-time pad is a perfectly secure symmetric encryption scheme. Its space of plaintexts, ciphertexts and keys is the set {0, <sup>1</sup>}-—fixed-length bitstrings of size . The encryption algorithm is parametrized by a key k—sampled uniformly over the set of bitstrings {0, <sup>1</sup>}-—and maps every plaintext m to the ciphertext c = k ⊕ m, where the operator ⊕ denotes bitwise exclusive-or on bitstrings. We let otp denote the expression λm.let k = U{0,1} in munit(k ⊕ m), where U<sup>X</sup> is the uniform distribution over a finite set X.

One-time pad achieves perfect security, i.e. the distributions of ciphertexts is independent of the plaintext. Perfect security can be captured as a probabilistic non-interference property:

$$\vdash \mathsf{otp} : \{0, 1\}^{\ell} \to \mathsf{D}(\{0, 1\}^{\ell}) \sim \mathsf{otp} : \{0, 1\}^{\ell} \to \mathsf{D}(\{0, 1\}^{\ell}) \mid \forall m\_1 m\_2. \mathbf{r}\_1 \; m\_1 \overset{\diamond}{=} \mathbf{r}\_2 \; m\_2$$

where e<sup>1</sup> = e<sup>2</sup> is used as a shorthand for [y1←e1,y2←e2]y<sup>1</sup> = y2. The crux of the proof is to establish

$$m\_1, m\_2 : \{0, 1\}^\ell \vdash \mathcal{U}\_{\{0, 1\}^\ell} : \mathsf{D}(\{0, 1\}^\ell) \sim \mathcal{U}\_{\{0, 1\}^\ell} : \mathsf{D}(\{0, 1\}^\ell) \mid \mathsf{r}\_1 \oplus m\_2 \overset{\diamond}{=} \mathsf{r}\_2 \oplus m\_1$$

using the [COUPLING] rule. It suffices to observe that the assertion induces a bijection, so the image of an arbitrary set X under the relation has the same cardinality as X, and hence their probabilities w.r.t. the uniform distributions are equal. One can then conclude the proof by applying the rules for monadic sequenciation ([MLET]) and abstraction (rule [ABS] in appendix), using algebraic properties of ⊕.

Interestingly, one can prove a stronger property: rather than proving that the ciphertext is independent of the plaintext, one can prove that the distribution of ciphertexts is uniform. This is captured by the following judgement:

$$c\_1c\_2 : \{0,1\}^\ell \vdash \mathsf{otp} : \{0,1\}^\ell \to \mathsf{D}(\{0,1\}^\ell) \sim \mathsf{otp} : \{0,1\}^\ell \to \mathsf{D}(\{0,1\}^\ell) \mid \psi$$

where ψ - ∀m<sup>1</sup> m2.m<sup>1</sup> = m<sup>2</sup> ⇒ [y1←**r**<sup>1</sup> <sup>m</sup>1,y2←**r**<sup>2</sup> <sup>m</sup>2]y<sup>1</sup> = c<sup>1</sup> ⇔ y<sup>2</sup> = c2. This style of modelling uniformity as a relational property is inspired from [11]. The proof is similar to the previous one and omitted. However, it is arguably more natural to model uniformity of the distribution of ciphertexts by the judgement:

 otp : {0, <sup>1</sup>}- <sup>→</sup> <sup>D</sup>({0, <sup>1</sup>}- ) ∼ U{0,1}- : <sup>D</sup>({0, <sup>1</sup>}- ) | ∀m. **<sup>r</sup>**<sup>1</sup> <sup>m</sup> = **r**<sup>2</sup>

This judgement is closer to the simulation-based notion of security that is used pervasively in cryptography, and notably in Universal Composability [12]. Specifically, the statement captures the fact that the one-time pad algorithm can be simulated without access to the message. It is interesting to note that the judgement above (and more generally simulation-based security) could not be expressed in prior works, since the two expressions of the judgement have different types—note that in this specific case, the right expression is a distribution but in the general case the right expression will also be a function, and its domain will be a projection of the domain of the left expression.

The proof proceeds as follows. First, we prove

$$\vdash \mathcal{U}\_{\{0,1\}^\ell} \sim \mathcal{U}\_{\{0,1\}^\ell} \mid \forall m. \; \Diamond\_{\{y\_1 \leftarrow \mathbf{r}\_1, y\_2 \leftarrow \mathbf{r}\_2\}} y\_1 \oplus m = y\_2.$$

using the [COUPLING] rule. Then, we apply the [MLET] rule to obtain

$$\begin{array}{c} \vdash \begin{array}{l} \text{let } k = \mathcal{U}\_{\{0,1\}^{\ell}} \text{ in } \\ \text{munit}(k \oplus m) \end{array} \sim \begin{array}{l} \text{let } k = \mathcal{U}\_{\{0,1\}^{\ell}} \text{ in } \\ \text{munit}(k) \end{array} \mid \diamond\_{\left[y\_{1} \leftarrow \mathbf{r}\_{1}, y\_{2} \leftarrow \mathbf{r}\_{2}\right]} y\_{1} = y\_{2} \end{array}$$

We have let k = U{0,1} in munit(k) ≡ U{0,1}- ; hence by equivalence (rule [Equiv] in appendix), this entails

$$\vdash \text{let } k = \mathcal{U}\_{\{0,1\}^\ell} \text{ in } \mathsf{numit}(k \oplus m) \sim \mathcal{U}\_{\{0,1\}^\ell} \mid \diamond\_{\{y\_1 \leftarrow \mathbf{r}\_1, y\_2 \leftarrow \mathbf{r}\_2\}} y\_1 = y\_2$$

We conclude by applying the one-sided rule for abstraction.

**Stochastic Dominance.** Stochastic dominance defines a partial order between random variables whose underlying set is itself a partial order; it has many different applications in statistical biology (e.g. in the analysis of the birth-anddeath processes), statistical physics (e.g. in percolation theory), and economics. First-order stochastic dominance, which we define below, is also an important application of probabilistic couplings. We demonstrate how to use our proof system for proving (first-order) stochastic dominance for a simple Markov process which samples biased coins. While the example is elementary, the proof method extends to more complex examples of stochastic dominance, and illustrates the benefits of Strassen's formulation of the coupling rule over alternative formulations stipulating the existence of bijections (explained later).

We start by recalling the definition of (first-order) stochastic dominance for the N-valued case. The definition extends to arbitrary partial orders.

**Definition 4 (Stochastic dominance).** *Let* <sup>μ</sup>1, μ<sup>2</sup> <sup>∈</sup> <sup>D</sup>(N)*. We say that* <sup>μ</sup><sup>2</sup> *stochastically dominates* <sup>μ</sup>1*, written* <sup>μ</sup><sup>1</sup> <sup>≤</sup>SD <sup>μ</sup>2*, iff for every* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,*

$$\Pr\_{x \leftarrow \mu\_1}[x \ge n] \le \Pr\_{x \leftarrow \mu\_2}[x \ge n]$$

The following result, equivalent to Corollary 1, characterizes stochastic dominance using probabilistic couplings.

**Proposition 1.** *Let* <sup>μ</sup>1, μ<sup>2</sup> <sup>∈</sup> <sup>D</sup>(N)*. Then* <sup>μ</sup><sup>1</sup> <sup>≤</sup>SD <sup>μ</sup><sup>2</sup> *iff* <sup>μ</sup>1,μ<sup>2</sup> .(≤)*.*

We now turn to the definition of the Markov chain. For p ∈ [0, 1], we consider the parametric N-valued Markov chain coins markov(0, h), with initial state 0 and (parametric) step function:

$$h \triangleq \lambda x. \text{let } b = \mathcal{B}(p) \text{ in } \mathbf{numit}(x+b)$$

where, for p ∈ [0, 1], B(p) is the Bernoulli distribution on {0, 1} with probability p for 1 and 1 − p for 0. Our goal is to establish that coins is monotonic, i.e. for every p1, p<sup>2</sup> ∈ [0, 1], p<sup>1</sup> ≤ p<sup>2</sup> implies coins p<sup>1</sup> ≤SD coins p2. We formalize this statement as

$$\vdash \mathsf{coins} : [0, 1] \to \mathsf{D}(\mathrm{Str\_N}) \sim \mathsf{coins} : [0, 1] \to \mathsf{D}(\mathrm{Str\_N}) \mid \psi \vdash$$

where ψ - ∀p1, p2.p<sup>1</sup> ≤ p<sup>2</sup> ⇒ [y1←**r**1,y2←**r**2] All(y1, y2, z1.z2.z<sup>1</sup> ≤ z2). The crux of the proof is to establish stochastic dominance for the Bernoulli distribution:

$$p\_1: [0, 1], p\_2: [0, 1] \mid p\_1 \le p\_2 \vdash \mathcal{B}(p\_1): \mathcal{D}(\mathbb{N}) \sim \mathcal{B}(p\_2): \mathcal{D}(\mathbb{N}) \mid \mathbf{r}\_1 \overset{\diamond}{\le} \mathbf{r}\_2$$

where we use e<sup>1</sup> ≤ e<sup>2</sup> as shorthand for [y1←e1,y2←e2]y<sup>1</sup> ≤ y2. This is proved directly by the [COUPLING] rule and checking by simple calculations that the premise of the rule is valid.

We briefly explain how to conclude the proof. Let h<sup>1</sup> and h<sup>2</sup> be the step functions for p<sup>1</sup> and p<sup>2</sup> respectively. It is clear from the above that (context omitted):

$$x\_1 \le x\_2 \vdash h\_1 \; x\_1 : \mathsf{D}(\mathbb{B}) \sim h\_2 \; x\_2 : \mathsf{D}(\mathbb{B}) \mid \diamond\_{[y\_1 \leftarrow \mathsf{r}\_1, y\_2 \leftarrow \mathsf{r}\_2]} y\_1 \le y\_2$$

and by the definition of All:

$$x\_1 \le x\_2 \Rightarrow \text{All}(xs\_1, xs\_2, z\_1.z\_2.z\_1 \le z\_2) \Rightarrow \text{All}(x\_1 :: \rhd xs\_1, x\_2 :: \rhd xs\_2, z\_1.z\_2.z\_1 \le z\_2)$$

So, we can conclude by applying the [Markov] rule.

It is instructive to compare our proof with prior formalizations, and in particular with the proof in [5]. Their proof is carried out in the pRHL logic, whose [COUPLING] rule is based on the existence of a bijection that satisfies some property, rather than on our formalization based on Strassen's Theorem. Their rule is motivated by applications in cryptography, and works well for many examples, but is inconvenient for our example at hand, which involves non-uniform probabilities. Indeed, their proof is based on code rewriting, and is done in two steps. First, they prove equivalence between sampling and returning x<sup>1</sup> from B(p1); and sampling <sup>z</sup><sup>1</sup> from <sup>B</sup>(p2), <sup>z</sup><sup>2</sup> from <sup>B</sup>(<sup>p</sup>1/<sup>p</sup><sup>2</sup> ) and returning <sup>z</sup> <sup>=</sup> <sup>z</sup><sup>1</sup> <sup>∧</sup> <sup>z</sup>2. Then, they find a coupling between z and B(p2).

**Shift Coupling: Random Walk vs Lazy Random Walk.** The previous example is an instance of a lockstep coupling, in that it relates the k-th element of the first chain with the k-th element of the second chain. Many examples from the literature follow this lockstep pattern; however, it is not always possible to establish lockstep couplings. Shift couplings are a relaxation of lockstep couplings where we relate elements of the first and second chains without the requirement that their positions coincide.

We consider a simple example that motivates the use of shift couplings. Consider the random walk and lazy random walk (which, at each time step, either chooses to move or stay put), both defined as Markov chains over Z. For simplicity, assume that both walks start at position 0. It is not immediate to find a coupling between the two walks, since the two walks necessarily get desynchronized whenever the lazy walk stays put. Instead, the trick is to consider a lazy random walk that moves two steps instead of one. The random walk and the lazy random walk of step 2 are defined by the step functions:

$$\begin{array}{l} \text{step} \triangleq \lambda x. \text{let } z = \mathcal{U}\_{\{-1,1\}} \text{ in } \mathsf{numit}(z+x) \\\text{listep2} \triangleq \lambda x. \mathsf{let } z = \mathcal{U}\_{\{-1,1\}} \text{ in } \mathsf{let } b = \mathcal{U}\_{\{0,1\}} \text{ in } \mathsf{numit}(x+2\*z\*b) \end{array}$$

After 2 iterations of step, the position has either changed two steps to the left or to the right, or has returned to the initial position, which is the same behaviour lstep2 has on every iteration. Therefore, the coupling we want to find should equate the elements at position 2i in step with the elements at position i in lstep2. The details on how to prove the existence of this coupling are in Sect. 6.

**Lumped Coupling: Random Walks on 3 and 4 Dimensions.** A Markov chain is *recurrent* if it has probability 1 of returning to its initial state, and *transient* otherwise. It is relatively easy to show that the random walk over Z is recurrent. One can also show that the random walk over Z<sup>2</sup> is recurrent. However, the random walk over Z<sup>3</sup> is transient.

For higher dimensions, we can use a coupling argument to prove transience. Specifically, we can define a coupling between a lazy random walk in n dimensions and a random walk in n+m dimensions, and derive transience of the latter from transience of the former. We define the (lazy) random walks below, and sketch the coupling arguments.

Specifically, we show here the particular case of the transience of the 4 dimensional random walk from the transience of the 3-dimensional lazy random walk. We start by defining the stepping functions:

$$\begin{array}{l} \mathsf{step}\_{4}: \mathbb{Z}^{4} \to \mathsf{D}(\mathbb{Z}^{4}) \triangleq \lambda z\_{1}. \mathsf{let}\ x\_{1} = \mathsf{Id}\_{U\_{4}} \text{ in } \mathsf{numit}(z\_{1} +\_{4}x\_{1})\\ \mathsf{letstep}\_{3}: \mathbb{Z}^{3} \to \mathsf{D}(\mathbb{Z}^{3}) \triangleq \lambda z\_{2}. \mathsf{let}\ x\_{2} = \mathsf{Id}\_{U\_{3}} \text{ in } \mathsf{let}\ b\_{2} = \mathcal{B}(^{3}\_{/4}) \text{ in } \mathsf{numit}(z\_{2} +\_{3}b\_{2} \* x\_{2}) \end{array}$$

where <sup>U</sup><sup>i</sup> <sup>=</sup> {(±1, <sup>0</sup>,... 0),...,(0,..., <sup>0</sup>, <sup>±</sup>1)} are the vectors of the basis of <sup>Z</sup><sup>i</sup> and their opposites. Then, the random walk of dimension 4 is modelled by rwalk4 markov(0,step4), and the lazy walk of dimension 3 is modelled by lwalk3 markov(0,step3). We want to prove:

$$\vdash \text{rawk4} : \mathsf{D}(\text{Str}\_{\mathbb{Z}^4}) \sim \text{lwalk3} : \mathsf{D}(\text{Str}\_{\mathbb{Z}^3}) \mid \diamondsuit\_{\left[\begin{smallmatrix} y\_1\\y\_2 \leftarrow \mathbf{r}\_2 \end{smallmatrix} \right]} \text{All}(y\_1, y\_2, z\_1.z\_2.\text{pr}\_3^4(z\_1) = z\_2)$$

where pr<sup>n</sup><sup>2</sup> <sup>n</sup><sup>1</sup> denotes the standard projection from <sup>Z</sup><sup>n</sup><sup>2</sup> to <sup>Z</sup><sup>n</sup><sup>1</sup> .

We apply the [Markov] rule. The only interesting premise requires proving that the transition function preserves the coupling:

$$p\_2 = \text{pr}\_3^4(p\_1) \vdash \text{step}\_4 \sim \text{lstep}\_3 \mid \forall x\_1 x\_2. x\_2 = \text{pr}\_3^4(x\_1) \Rightarrow \phi\_{\begin{subarray}{c} [y\_2 \leftarrow \mathbf{r}\_2 \ \mathbf{r}\_2] \\ [y\_2 \leftarrow \mathbf{r}\_2 \ \mathbf{r}\_2] \end{subarray}} \text{pr}\_3^4(y\_1) = y\_2$$

To prove this, we need to find the appropriate coupling, i.e., one that preserves the equality. The idea is that the step in Z<sup>3</sup> must be the projection of the step in Z<sup>4</sup>. This corresponds to the following judgement:

$$\begin{array}{c|c} \lambda z\_1. \mathsf{let} \ x\_1 = \mathsf{Id}\_{U\_4} \ \mathsf{in} \\ \mathsf{numit}(z\_1 +\_4 x\_1) \end{array} \sim \begin{array}{c|c} \lambda z\_2. \mathsf{let} \ x\_2 = \mathsf{Id}\_{U\_3} \ \mathsf{in} \\ \mathsf{let} \ b\_2 = \mathcal{B}(^3{}\_4) \ \mathsf{in} \\ \mathsf{numit}(z\_2 +\_3 b\_2 \* x\_2) \end{array} \Bigg| \begin{array}{c|c} \forall z\_1 z\_2. \mathsf{pr}^4\_3(z\_1) = z\_2 \Rightarrow \\ \mathsf{pr}^4\_3(\mathsf{r}\_1 \ z\_1) \stackrel{\circ}{=} \mathsf{r}\_2 \ z\_2 \end{array}$$

which by simple equational reasoning is the same as

$$\begin{array}{c|c} \lambda z\_1. \text{ let } x\_1 = \mathcal{U}\_{U\_4} \text{ in } \\ \mathsf{munit}(z\_1 +\_4 x\_1) \end{array} \sim \begin{array}{c|c} \lambda z\_2. \text{ let } p\_2 = \mathcal{U}\_{U\_3} \times \mathcal{B}(^3/4) \text{ in } \\ \mathsf{munit}(z\_2 +\_3 \pi\_1(p\_2) \* \pi\_2(p\_2)) & \mathsf{pr}\_3^4(\mathbf{r}\_1 \ z\_1) \stackrel{\circ}{=} \mathbf{r}\_2 \ z\_2 \end{array}$$

We want to build a coupling such that if we sample (0, 0, 0, 1) or (0, 0, 0, −1) from <sup>U</sup><sup>U</sup><sup>3</sup> , then we sample 0 from <sup>B</sup>(<sup>3</sup> /4), and otherwise if we sample (x1, x2, x3, 0) from U<sup>U</sup><sup>4</sup> , we sample (x1, x2, x3) from U3. Formally, we prove this with the [Coupling] rule. Given <sup>X</sup> : <sup>U</sup><sup>4</sup> <sup>→</sup> <sup>B</sup>, by simple computation we show that:

$$\Pr\_{z\_1 \sim \mathcal{U}\_{U\_4}}[z\_1 \in X] \le \Pr\_{z\_2 \sim \mathcal{U}\_{U\_3} \times \mathcal{B}(^3\!\!/ \_4)}[z\_2 \in \{y \mid \exists x \in X. \mathsf{pr}\_3^4(x) = \pi\_1(y) \* \pi\_2(y)\}]$$

This concludes the proof. From the previous example, it follows that the lazy walk in 3 dimensions is transient, since the random walk in 3 dimensions is transient. By simple reasoning, we now conclude that the random walk in 4 dimensions is also transient.

### **4 Probabilistic Guarded Lambda Calculus**

To ensure that a function on infinite datatypes is well-defined, one must check that it is *productive*. This means that any finite prefix of the output can be computed in finite time. For instance, consider the following function on streams:

$$\textbf{\color{red}{1}\color{red}{t}\color{red}{e}\color{red}{b}\color{red}{a}\color{red}{x}\color{red}{s}\color{red}{s}}\color{red}{\ (\color{red}{x}\color{red}{.\color{red}{x}\color{red}{s}})\color{red}{x}}\color{red}{\ (\color{red}{a}\color{red}{a}\color{red}{s}\color{red}{s})}\color{red}{\ (\color{red}{b}\color{red}{a}\color{red}{s}\color{red}{s})}$$

This function is not productive since only the first element can be computed. We can argue this as follows: Suppose that the tail of a stream is available one unit of time after its head, and that x:xs is available at time 0. How much time does it take for bad to start outputting its tail? Assume it takes <sup>k</sup> units of time. This means that tail(bad xs) will be available at time <sup>k</sup> + 1, since xs is only available at time 1. But tail(bad xs) is exactly the tail of bad(x:xs), and this is a contradiction, since x:xs is available at time 0 and therefore the tail of bad(x:xs) should be available at time <sup>k</sup>. Therefore, the tail of bad will never be available.

The guarded lambda calculus solves the productivity problem by distinguishing at type level between data that is available now and data that will be available in the future, and restricting when fixpoints can be defined. Specifically, the guarded lambda calculus extends the usual simply typed lambda calculus with two modalities: (pronounced *later* ) and (*constant*). The later modality represents data that will be available one step in the future, and is introduced and removed by the term formers and prev respectively. This modality is used to guard recursive occurrences, so for the calculus to remain productive, we must restrict when it can be eliminated. This is achieved via the constant modality, which expresses that all the data is available at all times. In the remainder of this section we present a probabilistic extension of this calculus.

*Syntax.* Types of the calculus are defined by the grammar

$$A, B ::= b \mid \mathbb{N} \mid A \times B \mid A + B \mid A \to B \mid \text{Str}\_A \mid \square \mid A \mid \rhd A \mid \mathsf{D}(C)$$

where b ranges over a collection of base types. Str<sup>A</sup> is the type of guarded streams of elements of type A. Formally, the type Str<sup>A</sup> is isomorphic to A × StrA. This isomorphism gives a way to introduce streams with the function (::) : A → Str<sup>A</sup> → Str<sup>A</sup> and to eliminate them with the functions hd : Str<sup>A</sup> → A and tl : Str<sup>A</sup> → StrA. D(C) is the type of distributions over *discrete types* C. Discrete types are defined by the following grammar, where b<sup>0</sup> are discrete base types, e.g., Z.

$$C, D ::= b\_0 \mid \mathbb{N} \mid C \times D \mid C + D \mid \mathrm{Str}\_C \mid \flat \colon C.$$

Note that, in particular, arrow types are not discrete but streams are. This is due to the semantics of streams as sets of finite approximations, which we describe in the next subsection. Also note that Str<sup>A</sup> is not discrete since it makes the full infinite streams available.

We also need to distinguish between arbitrary types A, B and constant types S, T, which are defined by the following grammar

$$S, T ::= b\_C \mid \mathbb{N} \mid S \times T \mid S + T \mid S \to T \mid \Box \mid A$$

where b<sup>C</sup> is a collection of constant base types. Note in particular that for any type A the type A is constant.

The terms of the language t are defined by the following grammar

$$\begin{aligned} t &::= x \mid c \mid 0 \mid St \mid \text{case } t \text{ of } 0 \mapsto t; S \mapsto t \mid \mu \mid \text{numit}(t) \mid \text{let } x = t \text{ in } t \\ &\mid \langle t, t \rangle \mid \pi\_1 t \mid \pi\_2 t \mid \text{inj}\_1 t \mid \text{inj}\_2 t \mid \text{case } t \,\text{of} \,\text{inj}\_1 x.t; \text{inj}\_2 y.t \mid \lambda x.t \mid t \, t \mid \text{fix } x.t \rangle \\ &\mid t :: ts \mid \text{hd} \, t \mid \text{tl} \, t \mid \text{box } t \mid \text{let } x \leftarrow t \,\text{in } t \mid \text{let } x \leftarrow t \,\text{in } t \mid \text{> } \xi.t \mid \text{prev } t \end{aligned}$$

where ξ is a delayed substitution, a sequence of bindings [x<sup>1</sup> ← t1,...,x<sup>n</sup> ← tn]. The terms c are constants corresponding to the base types used and munit(t) and let x = t in t are the introduction and sequencing construct for probability distributions. The meta-variable μ stands for base distributions like U<sup>C</sup> and B(p).

Delayed substitutions were introduced in [13] in a dependent type theory to be able to work with types dependent on terms of type A. In the setting of a simple type theory, such as the one considered in this paper, delayed substitutions are equivalent to having the applicative structure [14] for the modality. However, delayed substitutions extend uniformly to the level of propositions, and thus we choose to use them in this paper in place of the applicative structure.

*Denotational Semantics.* The meaning of terms is given by a denotational model in the category S of presheaves over ω, the first infinite ordinal. This category S is also known as the *topos of trees* [15]. In previous work [1], it was shown how to model most of the constructions of the guarded lambda calculus and its internal logic, with the notable exception of the probabilistic features. Below we give an elementary presentation of the semantics.

Informally, the idea behind the topos of trees is to represent (infinite) objects from their finite approximations, which we observe incrementally as time passes. Given an object x, we can consider a sequence {xi} of its finite approximations observable at time i. These are trivial for finite objects, such as a natural number, since for any number n, n<sup>i</sup> = n at every i. But for infinite objects such as streams, the ith approximation is the prefix of length i + 1.

Concretely, the category S consists of:


The full interpretation of types of the calculus can be found in Fig. 8 in the appendix. The main points we want to highlight are:

– Streams over a type A are interpreted as sequences of finite prefixes of elements of A with the restriction functions of A:

$$\left[\mathrm{Str}\_A\right] \stackrel{\Delta}{=} \left[A\right]\_0 \times \left\{\ast\right\} \stackrel{r\_0 \times \cdot}{\longleftrightarrow} \left[A\right]\_1 \times \left[\mathrm{Str}\_A\right]\_0 \stackrel{r\_1 \times r\_0 \times \cdot}{\longleftrightarrow} \left[A\right]\_2 \times \left[\mathrm{Str}\_A\right]\_1 \longleftarrow \cdots \cdots$$

– Distributions over a discrete object C are defined as a sequence of distributions over each -Ci:

$$\left[\mathbb{D}(C)\right] \triangleq \mathsf{D}(\left[\![C]\!\right]\_{0}) \xleftarrow{\mathsf{D}(r\_{0})} \mathsf{D}(\left[\![C]\!\right]\_{1}) \xleftarrow{\mathsf{D}(r\_{1})} \mathsf{D}(\left[\![C]\!\right]\_{2}) \xleftarrow{\mathsf{D}(r\_{2})} \dots,\ , $$

where D(-Ci) is the set of (probability density) functions μ : -C<sup>i</sup> → [0, 1] such that <sup>x</sup>∈<sup>X</sup> μx = 1, and <sup>D</sup>(ri) adds the probability density of all the points in -Ci+1 that are sent by r<sup>i</sup> to the same point in the -Ci. In other words, D(ri)(μ)(x) = Pr<sup>y</sup>←<sup>μ</sup>[ri(y) = x]

An important property of the interpretation is that discrete types are interpreted as objects X such that X<sup>i</sup> is finite or countably infinite for every i. This allows us to define distributions on these objects without the need for measure theory. In particular, the type of guarded streams Str<sup>A</sup> is discrete provided A is, which is clear from the interpretation of the type StrA. Conceptually this holds because -StrA<sup>i</sup> is an approximation of real streams, consisting of only the first i + 1 elements.

An object X of S is *constant* if all its restriction functions are bijections. Constant types are interpreted as constant objects of S and for a constant type A the objects -A and -A are isomorphic in S.

*Typing Rules.* Terms are typed under a dual context Δ | Γ, where Γ is a usual context that binds variables to a type, and Δ is a constant context containing variables bound to types that are *constant*. The term letc x ← u in t allows us to shift variables between constant and non-constant contexts. The typing rules can be found in Fig. 2.

The semantics of such a dual context Δ | Γ is given as the product of types in Δ and Γ, except that we implicitly add in front of every type in Δ. In the particular case when both contexts are empty, the semantics of the dual context correspond to the terminal object 1, which is the singleton set {∗} at each time.

The interpretation of the well-typed term Δ | Γ t : A is defined by induction on the typing derivation, and can be found in Fig. 9 in the appendix.

*Applicative Structure of the Later Modality.* As in previous work we can define the operator satisfying the typing rule

$$\frac{\Delta \mid \varGamma \vdash t : \rhd (A \to B) \qquad \Delta \mid \varGamma \vdash u : \rhd A}{\Delta \mid \varGamma \vdash t \circledast u : \rhd B}$$

and the equation (t) (u) ≡ (t u) as the term t u -[f ← t, x ← u] .f x.

*Example: Modelling Markov Chains.* As an application of and an example of how to use guardedness and probabilities together, we now give the precise definition of the markov construct that we used to model Markov chains earlier:

$$\begin{array}{l} \mathsf{markov}: C \to (C \to \mathsf{D}(C)) \to \mathsf{D}(\mathsf{Str}\_{C})\\ \mathsf{markov} \stackrel{\scriptstyle \Phi}{=} \text{fix } f. \ \lambda x. \lambda h. \\ \mathsf{let } z = h \ x \ \mathsf{in} \ \mathsf{let } t = \text{swap}\_{\mathsf{p}\mathsf{D}}^{\mathsf{Str}\_{C}}(f \circledast z \otimes \mathsf{e} \rhd h) \ \mathsf{in} \ \mathsf{mun} \mathsf{it}(x::t) \end{array}$$

**Fig. 2.** A selection of the typing rules of the guarded lambda calculus. The rules for products, sums, and natural numbers are standard.

The guardedness condition gives f the type (C → (C → D(C)) → D(Str<sup>C</sup> )) in the body of the fixpoint. Therefore, it needs to be applied functorially (via ) to z and h, which gives us a term of type D(Str<sup>C</sup> ). To complete the definition we need to build a term of type D( Str<sup>C</sup> ) and then sequence it with :: to build a term of type D(Str<sup>C</sup> ). To achieve this, we use the primitive operator swap<sup>C</sup> <sup>D</sup> : D(C) → D(C), which witnesses the isomorphism between D(C) and D(C). For this isomorphism to exist, it is crucial that distributions be total (i.e., we cannot use subdistributions). Indeed, the denotation for D(C) is the sequence {∗} ← D(C1) ← D(C2) ← ... , while the denotation for D(C) is the sequence D({∗}) ← D(C1) ← D(C2) ← ... , and {∗} is isomorphic to D({∗}) in Set only if D considers only total distributions.

### **5 Guarded Higher-Order Logic**

We now introduce Guarded HOL (GHOL), which is a higher-order logic to reason about terms of the guarded lambda calculus. The logic is essentially that of [1], but presented with the dual context formulation analogous to the dual-context typing judgement of the guarded lambda calculus. Compared to standard intuitionistic higher-order logic, the logic GHOL has two additional constructs, corresponding to additional constructs in the guarded lambda calculus. These are the later modality () *on propositions*, with delayed substitutions, which expresses that a proposition holds one time unit into the future, and the "always" modality , which expresses that a proposition holds at all times. Formulas are defined by the grammar:

$$\phi, \psi ::= \top \mid \phi \land \psi \mid \phi \lor \psi \mid \neg \psi \mid \forall x. \phi \mid \exists x. \phi \mid \rhd \left[ x\_1 \leftarrow t\_1 \dots x\_n \leftarrow t\_n \right]. \phi \mid \Box \phi$$

The basic judgement of the logic is Δ | Σ | Γ | Ψ φ where Σ is a logical context for Δ (that is, a list of formulas well-formed in Δ) and Ψ is another logical context for the dual context Δ | Γ. The formulas in context Σ must be *constant* propositions. We say that a proposition φ is *constant* if it is well-typed in context Δ | · and moreover if every occurrence of the later modality in φ is under the modality. Selected rules are displayed in Fig. 3. We highlight [Loeb] induction, which is the key to reasoning about fixpoints: to prove that φ holds now, one can assume that it holds in the future. The interpretation of the formula Δ | Γ φ is a subobject of the interpretation -Δ | Γ. Concretely the interpretation A of Δ | Γ φ is a family {Ai} ∞ <sup>i</sup>=0 of sets such that A<sup>i</sup> ⊆ -Δ | Γi. This family must satisfy the property that if x ∈ Ai+1 then ri(x) ∈ A<sup>i</sup> where r<sup>i</sup> are the restriction functions of -Δ | Γ. The interpretation of formulas is defined by induction on the typing derivation. In the interpretation of the context Δ | Σ | Γ | Ψ the formulas in Σ are interpreted with the added modality. Moreover all formulas φ in Σ are typeable in the context Δ |· φ and thus their interpretations are subsets of -Δ. We treat these subsets of -Δ | Γ in the obvious way.

The cases for the semantics of the judgement Δ | Γ φ can be found in the appendix. It can be shown that this logic is sound with respect to its model in the topos of trees.

**Theorem 2 (Soundness of the semantics).** *The semantics of guarded higher-order logic is sound: if* Δ | Σ | Γ | Ψ φ *is derivable then for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* -Σ<sup>n</sup> ∩ -Ψ<sup>n</sup> ⊆ φ*.*

In addition, Guarded HOL is expressive enough to axiomatize standard probabilities over discrete sets. This axiomatization can be used to define the modality directly in Guarded HOL (as opposed to our relational proof system, were we use it as a primitive). Furthermore, we can derive from this axiomatization additional rules to reason about couplings, which can be seen in Fig. 4. These rules will be the key to proving the soundness of the probabilistic fragment of the relational proof system, and can be shown to be sound themselves.

**Proposition 2 (Soundness of derived rules).** *The additional rules are sound.*

### **6 Relational Proof System**

We complete the formal description of the system by describing the proof rules for the non-probabilistic fragment of the relational proof system (the rules of the probabilistic fragment were described in Sect. 3.2).

### **6.1 Proof Rules**

The rules for core λ-calculus constructs are identical to those of [2]; for convenience, we present a selection of the main rules in Fig. 7 in the appendix.

#### **Fig. 3.** Selected Guarded Higher-Order Logic rules

#### **Fig. 4.** Derived rules for probabilistic constructs

We briefly comment on the two-sided rules for the new constructs (Fig. 5). The notation Ω abbreviates a context Δ | Σ | Γ | Ψ. The rule [Next] relates two terms that have a term constructor at the top level. We require that both have one term in the delayed substitutions and that they are related pairwise. Then this relation is used to prove another relation between the main terms. This rule can be generalized to terms with more than one term in the delayed substitution. The rule [Prev] proves a relation between terms from the same delayed relation by applying prev to both terms. The rule [Box] proves a relation between two boxed terms if the same relation can be proven in a constant context. Dually, [LetBox] uses a relation between two boxed terms to prove a relation between their unboxings. [LetConst] is similar to [LetBox], but it requires instead a relation between two constant terms, rather than explicitly -ed terms. The rule [Fix] relates two fixpoints following the [Loeb] rule from Guarded HOL. Notice that in the premise, the fixpoints need to appear in the delayed substitution so that the inductive hypothesis is well-formed. The rule [Cons] proves relations on streams from relations between their heads and tails, while [Head] and [Tail] behave as converses of [Cons].

Figure 6 contains the one-sided versions of the rules. We only present the left-sided versions as the right-sided versions are completely symmetric. The rule [Next-L] relates at φ a term that has a with a term that does not have a . First, a unary property φ is proven on the term u in the delayed substitution, and it is then used as a premise to prove φ on the terms with delays removed. Rules for proving unary judgements can be found in the appendix. Similarly, [LetBox-L] proves a unary property on the term that gets unboxed and then uses it as a precondition. The rule [Fix-L] builds a fixpoint just on the left, and relates it with an arbitrary term t<sup>2</sup> at a property φ. Since φ may contain the variable **r**<sup>2</sup> which is not in the context, it has to be replaced when adding φ to the logical context in the premise of the rule. The remaining rules are similar to their two-sided counterparts.

### **6.2 Metatheory**

We review some of the most interesting metatheoretical properties of our relational proof system, highlighting the equivalence with Guarded HOL.

**Theorem 3 (Equivalence with Guarded HOL).** *For all contexts* Δ, Γ*; types* σ1, σ2*; terms* t1, t2*; sets of assertions* Σ,Ψ*; and assertions* φ*:*

$$\Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash t\_1 : \sigma\_1 \sim t\_2 : \sigma\_2 \mid \phi \iff \quad \Delta \mid \Sigma \mid \Gamma \mid \Psi \vdash \phi[t\_1/\mathbf{r}\_1][t\_2/\mathbf{r}\_2]$$

The forward implication follows by induction on the given derivation. The reverse implication is immediate from the rule which allows to fall back on Guarded HOL in relational proofs. (Rule [SUB] in the appendix). The full proof is in the appendix. The consequence of this theorem is that the syntax-directed, relational proof system we have built on top of Guarded HOL does not lose expressiveness.

The intended semantics of a judgement Δ | Σ | Γ | Ψ t<sup>1</sup> : A<sup>1</sup> ∼ t<sup>2</sup> : A<sup>2</sup> | φ is that, for every valuation δ |= Δ, γ |= Γ, if -Σ(δ) and -Ψ(δ, γ), then

$$[\![\phi]\!](\delta,\gamma[\mathbf{r}\_1 \leftarrow [t\_1]\!](\delta,\gamma),\mathbf{r}\_2 \leftarrow [[t\_2]\!](\delta,\gamma)])$$

Since Guarded HOL is sound with respect to its semantics in the topos of trees, and our relational proof system is equivalent to Guarded HOL, we obtain that our relational proof system is also sound in the topos of trees.

**Corollary 2 (Soundness and consistency).** *If* Δ | Σ | Γ | Ψ t<sup>1</sup> : σ<sup>2</sup> ∼ t<sup>2</sup> : σ<sup>2</sup> | φ*, then for every valuation* δ |= Δ*,* γ |= Γ*:*

$$\begin{aligned} \{\Delta \vdash \Box \Sigma \big](\delta) \land \{\Delta \mid \varGamma \vdash \Psi\}(\delta, \gamma) &\Rightarrow\\ \{\Delta \mid \varGamma, \mathbf{r}\_{1} : \sigma\_{1}, \mathbf{r}\_{1} : \sigma\_{2} \vdash \phi\}(\delta, \gamma \big[\mathbf{r}\_{1} \leftarrow \big[\Delta \mid \varGamma \vdash t\_{1}\big](\delta, \gamma)\big] \big[\mathbf{r}\_{2} \leftarrow \big[\Delta \mid \varGamma \vdash t\_{2}\big](\delta, \gamma)\big]\} \\ \text{In particular, there is no proof of } \Delta \mid \emptyset \mid \varGamma \mid \emptyset \vdash t\_{1} : \sigma\_{1} \sim t\_{2} : \sigma\_{2} \mid \bot. \end{aligned}$$

$$\begin{array}{c|c|c|c|c|c}\hline \cline{2-4} & \Delta\left[\begin{array}{c} \Delta\left[\begin{array}{c} \left[\begin{array}{c} \left[\begin{array}{c} \left[\begin{array}{c} \left[\begin{array}{c} \left[\begin{array}{c} \left[\begin{array}{c} \left[\left[\left<\left$$

**Fig. 5.** Two-sided rules for Guarded RHOL

#### **6.3 Shift Couplings Revisited**

We give further details on how to prove the example with shift couplings from Sect. 3.3. (Additional examples of relational reasoning on non-probabilistic streams can be found in the appendix) Recall the step functions:

$$\begin{array}{l} \mathsf{step} \triangleq \lambda x. \mathsf{let} \ z = \mathcal{U}\_{\{-1,1\}} \text{ in } \mathsf{numit}(z+x) \\\ \mathsf{listep2} \triangleq \lambda x. \mathsf{let} \ z = \mathcal{U}\_{\{-1,1\}} \text{ in } \mathsf{let} \ b = \mathcal{U}\_{\{0,1\}} \text{ in } \mathsf{numit}(x+2\*z\*b) \end{array}$$

We axiomatize the predicate All<sup>2</sup>,<sup>1</sup>, which relates the element at position 2i in one stream to the element at position i in another stream, as follows.

$$\begin{array}{c} \left[ \forall x\_1 x\_2 xs\_1 xs\_2 y\_1.\phi[z\_1/x\_1][z\_2/x\_2] \Rightarrow \\ \quad \rhd[ys\_1 \gets xs\_1].\rhd[zs\_1 \gets ys\_1, ys\_2 \gets xs\_2].\text{All}\_{2,1}(zs\_1, ys\_2, z\_1.z\_2.\phi) \Rightarrow \\ \quad \text{All}\_{2,1}(x\_1::y\_1::xs\_1, x\_2::xs\_2, z\_1.z\_2.\phi) \end{array}$$

In fact, we can assume that, in general, we have a family of All<sup>m</sup>1,m<sup>2</sup> predicates relating two streams at positions m<sup>1</sup> · i and m<sup>2</sup> · i for every i.

**Fig. 6.** One-sided rules for Guarded RHOL

We can now express the existence of a shift coupling by the statement:

$$p\_1 = p\_2 \vdash \mathsf{mækov}(p\_1, \mathsf{step}) \sim \mathsf{mækov}(p\_2, \mathsf{lstep2}) \mid \diamond\_{\begin{subarray}{c} [y\_2 \leftarrow \mathsf{r}\_2] \\ [y\_2 \leftarrow \mathsf{r}\_2] \end{subarray}} \mathrm{All}\_{2,1}(y\_1, y\_2, z\_1.z\_2.z\_1 = z\_2)$$

For the proof, we need to introduce an asynchronous rule for Markov chains:

$$\begin{array}{c} \Omega \vdash t\_1 : C\_1 \sim t\_2 : C\_2 \mid \phi\\ \Omega \vdash (\lambda x\_1. \textsf{let}\ x\_1' = h\_1 \, x\_1 \text{ in } h\_1 \, x\_1') : C\_1 \to \textsf{D}(C\_1) \sim h\_2 : C\_2 \to \textsf{D}(C\_2) \mid\\ \cline{2-4} \forall x\_1 x\_2. \phi[x\_1/z\_1][x\_2/z\_2] \Rightarrow \lozenge\_{[z\_1 \leftarrow \mathbf{r}\_1 \; x\_1, x\_2 \leftarrow \mathbf{r}\_2 \; x\_2] \phi}\\ \cline{2-4} \Omega \vdash \textsf{markbox}(t\_1, h\_1) : \textsf{D}(\mbox{Str}\_{C\_1}) \sim \textsf{markbox}(t\_2, h\_2) : \textsf{D}(\mbox{Str}\_{C\_2}) \mid\\ \cline{3-4} \uplozenge\_{[y\_1 \leftarrow \mathbf{r}\_1, y\_2 \leftarrow \mathbf{r}\_2] \; \text{All2}\_{1,1}(y\_1, y\_2, z\_1.z\_2.\phi)} \end{array} \textbf{Markov-2-1}$$

This asynchronous rule for Markov chains shares the motivations of the rule for loops proposed in [6]. Note that one can define a rule [Markov-m-n] for arbitrary m and n to prove a judgement of the form Allm,n on two Markov chains.

We show the proof of the shift coupling. By equational reasoning, we get:

$$\begin{array}{l} \lambda x\_1. \mathsf{let} \ x\_1' = h\_1 \ x\_1 \ \mathsf{in} \ h\_1 \ x\_1' \equiv \lambda x\_1. \mathsf{let} \ z\_1 = \mathcal{U}\_{\{-1,1\}} \ \mathsf{in} \ h\_1 \ (z\_1 + x\_1) \\\ j \equiv \ \lambda x\_1. \mathsf{let} \ z\_1 = \mathcal{U}\_{\{-1,1\}} \ \mathsf{in} \ \mathsf{let} \ z\_1' = \mathcal{U}\_{\{-1,1\}} \ \mathsf{in} \ \mathsf{unmit}(z\_1' + z\_1 + x\_1') \end{array}$$

and the only interesting premise of [Markov-2-1] is:

$$\begin{array}{c|c} \lambda x\_1. \text{ let } z\_1 = \mathcal{U}\_{\{-1,1\}} \text{ in } & \lambda x\_2. \text{ let } z\_2 = \mathcal{U}\_{\{-1,1\}} \text{ in } \\ \text{let } z'\_1 = \mathcal{U}\_{\{-1,1\}} \text{ in } & \sim & \text{let } b\_2 = \mathcal{U}\_{\{1,0\}} \text{ in } \\ \text{munit}(z'\_1 + z\_1 + x'\_1) & \text{munit}(x\_2 + 2 \* b\_2 \* z\_2) & \text{ $\mathbf{r}\_1 \ \mathbf{r}\_1 = \mathbf{r}\_2$  } x\_2 \\ \end{array}$$

Couplings between z<sup>1</sup> and z<sup>2</sup> and between z <sup>1</sup> and b<sup>2</sup> can be found by simple computations. This completes the proof.

### **7 Related Work**

Our probabilistic guarded λ-calculus and the associated logic Guarded HOL build on top of the guarded λ-calculus and its internal logic [1]. The guarded λ-calculus has been extended to guarded dependent type theory [13], which can be understood as a theory of guarded refinement types and as a foundation for proof assistants based on guarded type theory. These systems do not reason about probabilities, and do not support syntax-directed (relational) reasoning, both of which we support.

Relational models for higher-order programming languages are often defined using logical relations. [16] showed how to use second-order logic to define and reason about logical relations for the second-order lambda calculus. Recent work has extended this approach to logical relations for higher-order programming languages with computational effects such as nontermination, general references, and concurrency [17–20]. The logics used in *loc. cit.* are related to our work in two ways: (1) the logics in *loc. cit.* make use of the later modality for reasoning about recursion, and (2) the models of the logics in *loc. cit.* can in fact be defined using guarded type theory. Our work is more closely related to Relational Higher Order Logic [2], which applies the idea of logic-enriched type theories [21,22] to a relational setting. There exist alternative approaches for reasoning about relational properties of higher-order programs; for instance, [23] have recently proposed to use monadic reification for reducing relational verification of F<sup>∗</sup> to proof obligations in higher-order logic.

A series of work develops reasoning methods for probabilistic higher-order programs for different variations of the lambda calculus. One line of work has focused on operationally-based techniques for reasoning about contextual equivalence of programs. The methods are based on probabilistic bisimulations [24,25] or on logical relations [26]. Most of these approaches have been developed for languages with discrete distributions, but recently there has also been work on languages with continuous distributions [27,28]. Another line of work has focused on denotational models, starting with the seminal work in [29]. Recent work includes support for relational reasoning about equivalence of programs with continuous distributions for a total programming language [30]. Our approach is most closely related to prior work based on relational refinement types for higher-order probabilistic programs. These were initially considered by [31] for a stateful fragment of F∗, and later by [32,33] for a pure language. Both systems are specialized to building probabilistic couplings; however, the latter support approximate probabilistic couplings, which yield a natural interpretation of differential privacy [34], both in its vanilla and approximate forms (i.e. and (, δ)-privacy). Technically, approximate couplings are modelled as a graded monad, where the index of the monad tracks the privacy budget ( or (, δ)). Both systems are strictly syntax-directed, and cannot reason about computations that have different types or syntactic structures, while our system can.

### **8 Conclusion**

We have developed a probabilistic extension of the (simply typed) guarded λcalculus, and proposed a syntax-directed proof system for relational verification. Moreover, we have verified a series of examples that are beyond the reach of prior work. Finally, we have proved the soundness of the proof system with respect to the topos of trees.

There are several natural directions for future work. One first direction is to enhance the expressiveness of the underlying simply typed language. For instance, it would be interesting to introduce clock variables and some type dependency as in [13], and extend the proof system accordingly. This would allow us, for example, to type the function taking the n-th element of a *guarded* stream, which cannot be done in the current system. Another exciting direction is to consider approximate couplings, as in [32,33], and to develop differential privacy for infinite streams—preliminary work in this direction, such as [35], considers very large lists, but not arbitrary streams. A final direction would be to extend our approach to continuous distributions to support other application domains.

**Acknowledgments.** We would like to thank the anonymous reviewers for their time and their helpful input. This research was supported in part by the ModuRes Sapere Aude Advanced Grant from The Danish Council for Independent Research for the Natural Sciences (FNU), by a research grant (12386, Guarded Homotopy Type Theory) from the VILLUM foundation, and by NSF under grant 1718220.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Types and Effects

## Failure is Not an Option An Exceptional Type Theory

Pierre-Marie Pédrot1(B) and Nicolas Tabareau<sup>2</sup>

<sup>1</sup> MPI-SWS, Saarbrücken, Germany ppedrot@mpi-sws.org <sup>2</sup> Inria, Nantes, France nicolas.tabareau@inria.fr

Abstract. We define the *exceptional translation*, a syntactic translation of the Calculus of Inductive Constructions (CIC) into itself, that covers full dependent elimination. The new resulting type theory features callby-name exceptions with decidable type-checking and canonicity, but at the price of inconsistency. Then, noticing parametricity amounts to Kreisel's realizability in this setting, we provide an additional layer on top of the exceptional translation in order to tame exceptions and ensure that all exceptions used locally are caught, leading to the *parametric exceptional translation* which fully preserves consistency. This way, we can consistently extend the logical expressivity of CIC with independence of premises, Markov's rule, and the negation of function extensionality while retaining η-expansion. As a byproduct, we also show that Markov's principle is not provable in CIC. Both translations have been implemented in a Coq plugin, which we use to formalize the examples.

### 1 Introduction

Monadic translations constitute a canonical way to add effects to pure functional languages [1]. Until recently, this technique was not available for type theories such as CIC because of complex interactions with dependency. In a recent paper [2], we have presented a generic way to extend the monadic translation to dependent types, using the *weaning translation*, as soon as the monad under consideration satisfies a crucial property: being self-algebraic. Indeed, in the same way that the universe of types <sup>i</sup> is itself a type (of a higher universe) in type theory, the type of algebras of a monad T

$$
\Sigma A: \Box\_i. \mathbf{T} \ A \to A.
$$

needs to be itself an algebra of the monad to allow a correct translation of the universe. However, in general, the weaning translation does not interpret all of CIC because dependent elimination needs to be restricted to linear predicates, that is, those that are intuitively call-by-value [3]. In this paper, we study the particular case of the error monad, and show that its weaning translation can be simplified and tweaked so that full dependent elimination is valid.

This *exceptional translation* gives rise to a novel extension of CIC with new computational behaviours, namely call-by-name exceptions.<sup>1</sup> That is, the type theory induced by the exceptional translation features new operations to raise and catch exceptions. This new logical expressivity comes at a cost, as the resulting theory is not consistent anymore, although still being computationally relevant. This means that it is possible to prove a contradiction, but, thanks to a weak form of canonicity, only because of an unhandled exception. Furthermore, the translation allows us to reason directly in CIC on terms of the exceptional theory, letting us prove, e.g., that assuming some properties on its input, an exceptional function actually never raises an exception. We thus have a sound logical framework to prove safety properties about impure dependently-typed programs.

We then push this technique further by noticing that parametricity provides a systematic way to describe that a term is not allowed to produce uncaught exceptions, bridging the gap between Kreisel's modified realizability [4] and parametricity inside type theory [5]. This *parametric exceptional translation* ensures that no exception reaches toplevel, thus ensuring consistency of the resulting theory. Pure terms are automatically handled, while it is necessary to show parametricity manually for terms internally using exceptions. We exploit this computational extension of CIC to show various logical results over CIC.

### *Contributions*


*Plan of the Paper.* In Sect. 2, we describe the exceptional translation and the resulting new computational principles arising from it. In Sect. 3, we present the parametric variant of the exceptional translation. Section 4 is devoted to the

<sup>1</sup> The fact that the resulting exception are call-by-name is explained in detailed in [2] using a call-by-push-value decomposition. Intuitively, it comes from the fact that CIC is naturally call-by-name.

<sup>2</sup> The plugin is available at https://github.com/CoqHott/exceptional-tt.

A, B, M, N ::= <sup>i</sup> | x | M N | λx : A. M | Πx : A. B Γ, Δ ::= · | Γ, x : A - Γ i<j Γ - <sup>i</sup> : j Γ - M : B Γ - A : i Γ, x : A - M : B Γ - A : <sup>i</sup> Γ, x : A - B : j Γ - Πx : A. B : max(i,j) Γ - M : B Γ - A : <sup>i</sup> A ≡ B Γ - M : A Γ, x : A - M : B Γ - Πx : A. B : i Γ λx : A. M : Πx : A. B Γ - M : Πx : A. B Γ - N : A Γ - M N : B{x := N} - · Γ - A : i - Γ, x : A Γ - A : i Γ, x : A x : A (λx : A. M) N ≡ M{x := N} (congruence rules ommitted)

Fig. 1. Typing rules of CC<sup>ω</sup>

various logical results resulting from the parametric exceptional translations. In Sect. 5, we discuss possible extensions of the translation with negative records and an impredicative universe. Section 6 describes the Coq plugin and illustrates its use on a concrete example. We discuss related work in Sect. 7 and conclude in Sect. 8.

### 2 The Exceptional Translation

We define in this section the exceptional translation as a syntactic translation between type theories. We call the target theory T , upon which we will make various assumptions depending on the objects we want to translate.

### 2.1 Adding Exceptions to **CC***<sup>ω</sup>*

In this section, we describe the exceptional translation over a purely negative theory, *i.e.*, featuring only universes and dependent functions, called CCω, which is presented in Fig. 1. This theory is a predicative version of the Calculus of Constructions [8], with an infinite hierarchy of universes <sup>i</sup> instead of one impredicative sort. We assume from now on that T contains at least CC<sup>ω</sup> itself.

The exceptional translation is a simplification of the weaning translation [2] applied to the error monad. Owing to the fact that it is specifically tailored for exceptions, this allows to give a more compact presentation of it.

Let E : -<sup>0</sup> be a fixed type of exceptions in T . The weaning translation for the error monad amounts to interpret types as algebras, *i.e.*, as inhabitants of the dependent sum ΣA : <sup>i</sup>.(<sup>A</sup> <sup>+</sup> <sup>E</sup>) <sup>→</sup> <sup>A</sup>. In this paper, we take advantage of the fact that the algebra morphism restricted to A is always the identity. Thus every type just comes with a way to interpret failure on this type, i.e. types are intuitively interpreted as a pair of an A : <sup>i</sup> with a default (raise) function <sup>A</sup><sup>∅</sup> : <sup>E</sup> <sup>→</sup> <sup>A</sup>. In practice, it is slightly more complicated as the universe of types itself is a type, so its interpretation must comes with a default function. We overcome this issue by assuming a term typei, representing types that can raise exceptions. This type comes with two constructors: TypeVal<sup>i</sup> which allows to construct a type<sup>i</sup> from a type and a default function on this type ; and another constructor TypeErr<sup>i</sup> that represents the default function at the level of type<sup>i</sup>. Furthermore, type<sup>i</sup> is equipped with an eliminator type\_elim<sup>i</sup> and thus can be thought of as an inductive definition. For simplicity, we axiomatize it instead of requiring inductive types in the target of the translation.

Definition 1. *We assume that* T *features the data below, where* i, j *indices stand for universe polymorphism.*

*–* <sup>Ω</sup><sup>i</sup> : <sup>E</sup> <sup>→</sup> i *–* ω<sup>i</sup> : Πe : E. Ω<sup>i</sup> e *–* type<sup>i</sup> : <sup>j</sup> *, where* i<j *–* TypeVal<sup>i</sup> : Π<sup>A</sup> : <sup>i</sup>.(<sup>E</sup> <sup>→</sup> <sup>A</sup>) <sup>→</sup> type<sup>i</sup> *–* TypeErr<sup>i</sup> : <sup>E</sup> <sup>→</sup> type<sup>i</sup> *–* type*\_*elimi,j : Π<sup>P</sup> : type<sup>i</sup> <sup>→</sup> j . (Π(A : <sup>i</sup>) (A<sup>∅</sup> : <sup>E</sup> <sup>→</sup> <sup>A</sup>). P (TypeVal<sup>i</sup> A A∅)) <sup>→</sup> (Π<sup>e</sup> : <sup>E</sup>. P (TypeErr<sup>i</sup> <sup>e</sup>)) <sup>→</sup> <sup>Π</sup><sup>T</sup> : type<sup>i</sup>.P T

*subject to the following definitional equations:*

type*\_*elimi,j P p<sup>v</sup> <sup>p</sup><sup>∅</sup> (TypeVal<sup>i</sup> A A∅) <sup>≡</sup> <sup>p</sup><sup>v</sup> A A<sup>∅</sup> type*\_*elimi,j P p<sup>v</sup> <sup>p</sup><sup>∅</sup> (TypeErr<sup>i</sup> <sup>e</sup>) <sup>≡</sup> <sup>p</sup><sup>∅</sup> <sup>e</sup>

The Ω term describes what it means for a type to fail, i.e. it ascribes a meaning to sequents of the form <sup>Γ</sup> <sup>M</sup> : fail <sup>e</sup>. In practice, it is irrelevant and can be chosen to be degenerate, e.g. Ω := <sup>λ</sup>\_ : <sup>E</sup>. unit.

In what follows, we often leave the universe indices implicit although they can be retrieved at the cost of more explicit annotations.

Before defining the exceptional translation we need to derive a term El<sup>3</sup> that recovers the underlying type from an inhabitant of type and Err that lifts the default function to this underlying type.

Definition 2. *From the data of Definition 1, we derive the following terms.*

$$\begin{array}{ll} \mathsf{EL}\_{i} & : & \mathsf{type}\_{i} \to \square\_{i} \\ & := \lambda A : \mathsf{type}\_{i}. \mathsf{type}\_{-} \mathsf{e} \mathsf{lim} \,(\lambda T : \mathsf{type}\_{i}. \square\_{i}) \\ & (\lambda (A\_{0} : \square\_{i}) \,(A\_{\mathcal{B}} : \mathbb{E} \to A\_{0}). A\_{0}) \,\Omega \, A \\ \mathsf{Err}\_{i} : & \Pi A : \mathsf{type}\_{i}. \mathbb{E} \to \mathsf{El}\_{i} \, A \\ & := \lambda (A : \mathsf{type}\_{i}) \,(e : \mathbb{E}). \mathsf{type}\_{-} \mathsf{elim} \, \mathsf{LI}\_{i} \\ & (\lambda (A\_{0} : \square\_{i}) \,(A\_{\mathcal{B}} : \mathbb{E} \to A\_{0}). A\_{\mathcal{B}} \, e) \,\omega \, A \end{array}$$

<sup>3</sup> The notation El refers to universes à la Tarski in Martin-Löf type theory.

[<sup>i</sup>] := TypeVal type<sup>i</sup> TypeErr<sup>i</sup> [x] := x [λx : A. M] := λx : [[A]]. [M] [M N] := [M] [N] [Πx : A. B] := TypeVal (Πx : [[A]]. [[B]]) (λ(e : E) (x : [[A]]). [B] ∅ e) [A] ∅ := Err [A] [[A]] := El [A] [[·]] := · [[Γ, x : A]] := [[Γ]], x : [[A]]

Fig. 2. Exceptional translation

The exceptional translation is defined in Fig. 2. As usual for syntactic translations [9], the term translation is given by [·] and the type translation, written [[·]], is derived from it using the function El. There is an additional macro [·] ∅, defined using Err<sup>i</sup>, which corresponds to the way to inhabit a given type from an exception.

Note that we will often slightly abuse the translation and use the [·] and [[·]] notation as macros acting on the target theory. This is merely for readability purposes, and the corresponding uses are easily expanded to the actual term.

The following lemma makes explicit how [[·]] and [·] ∅ behave on universes and on the dependent function space.

### Lemma 3 (Unfoldings). *The following definitional equations hold:*

*–* [[<sup>i</sup>]] <sup>≡</sup> type<sup>i</sup> *–* [[Πx : A. B]] ≡ Πx : [[A]]. [[B]] *–* [i] ∅ <sup>e</sup> <sup>≡</sup> TypeErr<sup>i</sup> <sup>e</sup> *–* [Πx : A. B] ∅ e ≡ λx : [[A]]. [B] ∅e

*Proof.* By unfolding and straightforward reductions.

The soundness of the translation follows from the following properties, which are fundamental but straightforward to prove.

#### Theorem 4 (Soundness). *The following properties hold.*

*–* [M{x := N}] ≡ [M]{x := [N]} *(substitution lemma). – If* M ≡ N *then* [M] ≡ [N] *(conversion lemma).*

*– If* Γ M : A *then* [[Γ]] [M] : [[A]] *(typing soundness). – If* Γ A : *then* [[Γ]] [A] ∅: <sup>E</sup> <sup>→</sup> [[A]] *(exception soundness).*

*Proof.* The first property is by routine induction on M, the second is direct by induction on the conversion derivation. The third is by induction on the typing derivation, the most important rule being <sup>i</sup> : <sup>j</sup> , which holds because [<sup>i</sup>] <sup>≡</sup> TypeVal type<sup>i</sup> TypeErr<sup>i</sup> has type type<sup>j</sup> which is convertible to [[<sup>j</sup> ]] by Lemma 3. The last property is a direct application of typing soundness and unfolding of Lemma 3 for universes.

We call T<sup>E</sup> the theory arising from this interpretation, which is formally defined in a way similar to standard categorical constructions over dependent type theory. Terms and contexts of T<sup>E</sup> are simply terms and contexts of T . A context Γ is valid is T<sup>E</sup> whenever its translation [[Γ]] is valid in T . Two terms M and N are convertible in T<sup>E</sup> whenever their translations [M] and [N] are convertible in T . Finally, Γ T<sup>E</sup> M : A whenever [[Γ]] <sup>T</sup> [M] : [[A]].

That is, it is possible to extend <sup>T</sup><sup>E</sup> with a new constant <sup>c</sup> of a given type <sup>A</sup> by providing an inhabitant <sup>c</sup><sup>E</sup> of the translated type [[A]]. Then the translation is extended with [c] := c<sup>E</sup>. The potential computational rules satisfied by this new constant are directly given by the computational rules satisfied by its translation. In some sense, the new constant c is just syntactic sugar for c<sup>E</sup>. Using <sup>T</sup><sup>E</sup>, Theorem 4 can be rephrased in the following way.

Theorem 5. *If* T *interprets* CC<sup>ω</sup> *then so does* T<sup>E</sup>*, that is, the exceptional translation is a syntactic model of* CCω*.*

### 2.2 Exceptional Inductive Types

The fact that the only effect we consider is raising exceptions does not really affect the negative fragment when compared to our previous work [2], but it sure shines when it comes to interpreting inductive datatypes. Indeed, as explained in the introduction, the weaning translation only interprets a subset of CIC, restricting dependent elimination to linear predicates. Furthermore, it also requires a few syntactic properties of the underlying monad ensuring that positivity criteria are preserved through the translation, which can be sometimes hard to obtain.

The exceptional translation diverges from the weaning translation precisely on inductives types. It allows a more compact translation of the latter, while at the same time providing a complete interpretation of CIC, that is, including full dependent elimination.

From now on, we assume that the target theory is a predicative restriction of CIC, i.e. that we can construct in it new inductive datatypes as we do in e.g. Coq [10], but without considering an impredicative universe. That is, all the inductive types we consider in this section live in -. As a matter of fact, we slightly abuse the usual nomenclature and simply call CIC this predicative fragment in the remainder of the paper. We refrain from describing the generic typing rules that extend CC<sup>ω</sup> into CIC, as they are fairly standard and would take up too much space. See for instance Werner's thesis for a comprehensive presentation [11].

$$\begin{aligned} \left[ \mathcal{T} \right] &:= \lambda \left( p\_1 : \left[ P\_1 \right] \right) \dots \left( p\_n : \left[ P\_n \right] \right) \left( i\_1 : \left[ I\_1 \right] \right) \dots \left( i\_m : \left[ I\_m \right] \right) . \\ &\qquad \mathtt{Type} \mathtt{Val} \left( \mathcal{T}^\bullet \ p\_1 \dots \ p\_n \ i\_1 \dots \ i\_m \right) \left( \mathcal{T}\_\mathcal{B} \ p\_1 \dots \ p\_n \ i\_1 \dots \ i\_m \right) \\ \left[ c\_1 \right] &:= c\_1^\bullet \\ &\dots \\ \left[ c\_k \right] := c\_k^\bullet \end{aligned}$$

Fig. 3. Inductive type translation

Type and Constructor Translation. As explained before, the intuitive interpretation of a type through the exceptional translation is a pair of a type and a default function from exceptions into that type. In particular, when translating some inductive type I, we must come up with a type [[I]] together with a default function <sup>E</sup> <sup>→</sup> [[I]]. As soon as <sup>E</sup> is inhabited, that means that we need [[I]] to be inhabited, preferably in a canonical way. The solution is simple: just as for types where we freely added the exceptional case by means of the TypeErr constructor, we freely add exceptions to every inductive type.

In practice, there is an elegant and simple way to do this. It just consists in translating constructors pointwise, while adding a new dedicated constructor standing for the exceptional case. We now turn to the formal construction.

Definition 6. *Let* I *be an inductive datatype with*

*– parameters* p<sup>1</sup> : P1, . . ., p<sup>n</sup> : Pn*; – indices* i<sup>1</sup> : I1, . . ., i<sup>m</sup> : Im*; – constructors* c<sup>1</sup> : Π(a1,<sup>1</sup> : A1,1)...(a1,l<sup>1</sup> : A1,l<sup>1</sup> ). I p<sup>1</sup> ... p<sup>n</sup> V1,<sup>1</sup> ... V1,m ... c<sup>k</sup> : Π(ak,<sup>1</sup> : Ak,1)...(ak,l*<sup>k</sup>* : Ak,l*<sup>k</sup>* ). I p<sup>1</sup> ... p<sup>n</sup> Vk,<sup>1</sup> ... Vk,m

*We define the exceptional translation of* I *and its constructors in Fig. 3, where* I• *is the inductive type defined by*

*– parameters* p<sup>1</sup> : [[P1]], . . ., p<sup>n</sup> : [[Pn]]*; – indices* i<sup>1</sup> : [[I1]], . . ., i<sup>m</sup> : [[Im]]*; – constructors* c• <sup>1</sup> : Π(a1,<sup>1</sup> : [[A1,1]])*...* (a1,l<sup>1</sup> : [[A1,l<sup>1</sup> ]]). I• p<sup>1</sup> *...* p<sup>n</sup> [V1,1] *...* [V1,m] *...* c• <sup>k</sup> : Π(ak,<sup>1</sup> : [[Ak,1]])*...* (ak,l*<sup>k</sup>* : [[Ak,l*<sup>k</sup>* ]]). I• p<sup>1</sup> *...* p<sup>n</sup> [Vk,1] *...* [Vk,m] <sup>I</sup><sup>∅</sup> : Π(i<sup>1</sup> : [[I1]])*...* (i<sup>m</sup> : [[Im]]).<sup>E</sup> → I• <sup>p</sup><sup>1</sup> *...* <sup>p</sup><sup>n</sup> <sup>i</sup><sup>1</sup> *...* <sup>i</sup><sup>m</sup>

*where in the recursive calls in the various* A*, we locally set*

$$\left[ \mathcal{T} \begin{array}{c} M\_1 \ \ldots \ \end{array} \begin{array}{c} N\_n \ N\_1 \ \ldots \ N\_m \end{array} \right] := \mathcal{T}^\bullet \left[ M\_1 \right] \ \ldots \ \left[ M\_n \right] \ \left[ N\_1 \right] \ \ldots \ \left[ N\_m \right].$$

*Example 7.* We give a few representative examples of the inductive translation in Fig. 4 in a Coq-like syntax. They were chosen because they are simple instances of inductive types featuring parameters, indices and recursion in an orthogonal way. For convenience, we write Σ A (λx : A. B) as Σx : A. B.

```
Ind bool : -
             :=
                                           Ind bool• : -
                                                          :=
                                            | true• : bool•
                                            | false• : bool•
                                            | bool∅ : E → bool•
Ind list (A : -
                ) : -
                     :=
                                           Ind list• (A : [[-
                                                              ]]) : -
                                                                    :=
                                            | nil• : list• A
                                            | cons• : [[A]] → list• A → list• A
                                            | list∅ : E → list• A
Ind Σ (A : -
             ) (B : A → -
                           ) : -
                               :=
                                           Ind Σ• (A : [[-
                                                           ]]) (B : [[A]] → -
                                                                           ) : -
                                                                                :=
                                            | ex• : Π(x : [[A]]) (y : [[B x]]). Σ• A B
                                            | Σ∅ : E → Σ• A B
Ind eq (A : -
              ) (x : A) : A → -
                               :=
                                           Ind eq• (A : [[-
                                                            ]]) (x : [[A]]) : [[A]] → -
                                                                                  :=
                                            | refl• : eq• Axx
                                            | eq∅ : Πy : [[A]]. E → eq• Axy
```
Fig. 4. Examples of translations of inductive types

*Remark 8.* The fact the we locally override the translation for recursive calls on the [[·]] translation of the type being defined means that we cannot handle cases where the translation of the type of a constructor actually contains an instance of [I]. Because of the syntactic positivity criterion, the only possibility for such a situation to occur in CIC is in the so-called nested inductive definitions. However, nested inductive types are essentially a programming convenience, as most nested types can be rewritten in an isomorphic way that is not nested.

Lemma 9. *If* <sup>I</sup> *is given as in Definition 6, we have for any terms* M *,* N

[[I M<sup>1</sup> ... M<sup>n</sup> N<sup>1</sup> ... Nm]] ≡ I• [M1] ... [Mn] [N1] ... [Nm].

This justifies a posteriori the simplified local definition we used in the recursive calls of the translation of the constructors.

Theorem 10. *For any inductive type* I *not using nested inductive types, the translation from Definition 6 is well-typed and satisfies the positivity criterion.*

*Proof.* Preservation of typing is a consequence of Theorem 4. The restriction on nested types, which is slightly stronger than the usual positivity criterion of CIC, is due to the fact that I<sup>∅</sup> is not available in the recursive calls and thus cannot be used to build a term of type type via the TypeVal constructor.

Preservation of the positivity criterion is straightforward, as the shape of every constructor c<sup>k</sup> is preserved, and furthermore by Lemma 3 the structure of every argument type is preserved by [[·]] as well. The only additional constructor I<sup>∅</sup> does not mention the recursive type and is thus automatically positive.

Corollary 11. *Type soundness holds for the translation of inductive types and their constructors.*

Pattern-Matching Translation. We now turn to the translation of the elimination of inductive terms, that is, pattern matching. Once again, its definition originates from the fact that we are working with call-by-name exceptions. It is well-known that in call-by-name, pattern matching implements a delimited form of call-by-value, by forcing its scrutinee before proceeding, at least up to the head constructor. Therefore, as soon as the matched term (re-)raises an exception, the whole pattern-matching reraises the same exception. A little care has to be taken in order to accomodate for the fact that the return type of the pattern-matching depends on the scrutinee, in particular when it is the default constructor of the inductive type.

In what follows, we use the i<sup>1</sup> ... i<sup>n</sup> notation for clarity, but compact it to i for space reasons, when appropriate.

Definition 12. *Assume an inductive* I *as given in Definition 6. Let* Q *be the well-typed pattern-matching defined as*

$$\begin{array}{lcl} \mathsf{match}\ M & \mathsf{return} \ \lambda(i\_1:I\_1) \ldots \left(i\_m:I\_m\right) \left(x:\mathcal{I}\ X\_1 \ldots \ldots \ X\_n \ i\_1 \ldots \ i\_m\right).R \text{ with} \\ & \mid \quad c\_1 \ a\_{1,1} \ldots \ldots \ a\_{1,l\_1} \Rightarrow\ N\_1 \\ & \ldots \\ & \mid \quad c\_k \ a\_{k,1} \ldots \ldots a\_{k,l\_k} \Rightarrow\ N\_k \\ \mathsf{end} \\ \end{array}$$

*where*

$$
\Gamma \vdash \vec{X} : \vec{P} \qquad \Gamma \vdash \vec{Y} : \vec{I} \{\vec{p} := \vec{X}\} \qquad \Gamma \vdash M : \mathcal{T} \ X\_1 \dots \ X\_n \ Y\_1 \dots \ Y\_m
$$

$$
\Gamma, \vec{i} : \vec{I} \{\vec{p} := \vec{X}\}, x : \mathcal{T} \vec{X} \ \vec{i} \vdash R : \Box \qquad \Gamma \vdash Q : R \{\vec{i} := \vec{Y}, x := M\}
$$

$$
\Gamma, \vec{a}\_1 : \vec{A}\_1 \vdash N\_1 : R \{\vec{i} := \vec{V}\_1 \{\vec{p} := \vec{X}\}, x := c\_1 \ \vec{X} \ \vec{a}\_1\}
$$

$$\cdots$$

$$\Gamma, \vec{a}\_k: \vec{A}\_k \vdash N\_k: R\{\vec{i} := \vec{V}\_k \{\vec{p} := \vec{X}\}, x := c\_k \text{ } \vec{X} \ \vec{a}\_k\}$$

*then we pose* [Q] *to be the following pattern-matching.* match [M] return λ(i<sup>1</sup> : [[I1]])*...* (i<sup>m</sup> : [[Im]]) (x : I*•* [X1] *...* [Xn] i<sup>1</sup> *...* im). [[R]] with | c*•* <sup>1</sup> a1,<sup>1</sup> *...* a1,l<sup>1</sup> ⇒ [N1] *...* | c*•* <sup>k</sup> ak,<sup>1</sup> *...* ak,l*<sup>k</sup>* ⇒ [Nk] | I<sup>∅</sup> i<sup>1</sup> *...* i<sup>m</sup> e ⇒ [R] <sup>∅</sup>{x := I<sup>∅</sup> X<sup>1</sup> *...* X<sup>n</sup> i<sup>1</sup> *...* i<sup>m</sup> e} e end

Lemma 13. *With notations and typing assumptions from Definition 12, we have*

$$[\Gamma] \vdash [Q] : [\boldsymbol{R}] \\ \{\vec{i} := [\vec{Y}], x := [M]\}.$$

*Proof.* Mostly a consequence of Theorem 4 applied to all of the premises of the pattern-matching rule. The only thing we have to check specifically is that the branch for the default constructor I<sup>∅</sup> is well-typed as

$$\{\Gamma\}, \vec{i}: \vec{I}\{\vec{p} := \vec{X}\}, e: \mathbb{E} \vdash [R]\_{\mathcal{B}}\{x := \mathcal{T}\_{\mathcal{B}} \: \vec{X} \: \vec{i} \: e\} \; e: [R] \{x := \mathcal{T}\_{\mathcal{B}} \: \vec{X} \: \vec{i} \: e\}$$

which is also due to Theorem 4 applied to R.

Lemma 14. *The translation preserves* ι*-rules.*

*Proof.* Immediate, as the translation preserves the structure of the patterns.

The translation is also applicable to fixpoints, but for the sake of readability we do not want to fully spell it out, although it is simply defined by congruence (commutation with the syntax). As such, it trivially preserves typing and reduction rules. Note that the Coq plugin presented in Sect. 6 features a complete translation of inductive types, pattern-matching and fixpoints. So the interested reader may experiment with the plugin to see how fixpoints are translated.

Therefore, by summarizing all of the previous properties, we have the following result.

Theorem 15. *If* T *interprets* CIC*, then so does* T<sup>E</sup>*, and thus the exceptional translation is a syntactic model of* CIC*.*

### 2.3 Flirting with Inconsistency

It is now time to point at the elephant in the room. The exceptional translation has a lot of nice properties, but it has one grave defect.

Theorem 16. *If* <sup>E</sup> *is inhabited, then* <sup>T</sup><sup>E</sup> *is logically inconsistent.*

*Proof.* The empty type is translated as

Ind empty• : -:= empty<sup>∅</sup> : <sup>E</sup> <sup>→</sup> empty•

which is inhabited as soon as E is.

Note that when E is empty, the situation is hardly better, as the translation is essentially the identity. However, when T satisfies canonicity, the situation is not totally desperate as T<sup>E</sup> enjoys the following weaker canonicity lemma.

Lemma 17 (Exceptional Canonicity). *Let* I *be an inductive type with constructors* c1*, ...,* c<sup>n</sup> *and assume that* T *satisfies canonicity. The translation of any closed term* T<sup>E</sup> M : I *evaluates either to a constructor of the form* c• <sup>i</sup> <sup>N</sup><sup>1</sup> ... N<sup>l</sup>*<sup>i</sup> or to the default constructor* <sup>I</sup><sup>∅</sup> <sup>e</sup> *for some* <sup>e</sup> : <sup>E</sup>*.*

*Proof.* Direct application of Theorem 4 and canonicity of T .

A direct consequence of Lemma 17 is that any proof of the empty type is an exception. As we will see in Sect. 4.1, for some types it is also possible to dynamically check whether a term of this type is a correct proof, in the sense that it does not raise an uncaught exception. This means that while T<sup>E</sup> is logically unsound, it is computationally relevant and can still be used as a *dependentlytyped programming language with exceptions*, a shift into a realm where we would have called the weaker canonicity Lemma 17 a *progress lemma*.

This is not the end of the story, though. Recall that T<sup>E</sup> only exists through its embedding [·] into T . In particular, if T is consistent, this means that one can reason about terms of T<sup>E</sup> directly in T . For instance, it is possible to prove in T that assuming some properties about its input, a function in T<sup>E</sup> never raises an exception. Hence not only do we have an effectul programming language, but we also have a *sound logical framework* allowing to transparently prove safety properties about impure programs.

It is actually even better than that. We will show in Sect. 3 that safety properties can be derived automatically for pure programs, allowing to recover a consistent type theory as long as T is consistent itself.

### 2.4 Living in an Exceptional World

We describe here what T<sup>E</sup> feels like in direct style. The exceptional theory feature a new type **<sup>E</sup>** which reifies the underlying type <sup>E</sup> of exceptions in <sup>T</sup><sup>E</sup>. It uses the fact that for <sup>E</sup>, the default function (here of type <sup>E</sup> <sup>→</sup> <sup>E</sup>) can simply be defined as the identity function. Its translation is given by

$$[\mathbf{E}]: [\square]: = \mathbf{Type} \mathbf{a1} \text{ } \mathbb{E} \ (\lambda e: \mathbb{E}.e).$$

Then, it is possible to define in <sup>T</sup><sup>E</sup> a function raise : <sup>Π</sup><sup>A</sup> : -. **E** → A that raises the provided exception at any type as

$$[\mathtt{raise}] := \lambda (A : \mathtt{type}) \begin{pmatrix} e : \mathbb{E}) . \mathtt{Err} \ A \ e . \end{pmatrix}$$

As we have already mentioned, the reader should be aware that the exceptions arising from this translation are call-by-name. This means that they do not behave like their usual call-by-value counterpart. In particular, we have in T<sup>E</sup>

$$\mathbf{raise } (\Pi x : A. B) \ e \equiv \lambda x : A. \mathbf{raise } B \text{ e }$$

which means that exceptions cannot be caught on Π-types. We can catch them on universes and inductive types though, because in those cases they are freely added through an extra constructor which one can pattern-match on. For instance, there exists in T<sup>E</sup> a term

$$\begin{array}{c} \mathsf{Catch}\_{\mathsf{bool}} : \Pi P : \mathsf{bool} \to \Box P \,\mathsf{true} \to P \,\mathsf{fail} \mathsf{s} \to \\ \qquad \qquad \qquad \qquad \left(\Pi e : \mathsf{E} . P \,\left(\mathsf{raise} \,\mathsf{bool} . e\right)\right) \to \Pi b : \mathsf{bool} . P \, b \,\mathsf{in} \end{array}$$

defined by

$$\begin{aligned} [\texttt{catch}\_{\texttt{bool}}] &:= \lambda P \, p\_t \, p\_f \, p\_e \, b. \,\texttt{match} \, b \,\texttt{return} \, \lambda b. \,\texttt{El} \,(P \, b) \,\,\texttt{with} \\ &\qquad \mid \,\texttt{true}^\bullet \Rightarrow p\_t \\ &\qquad \mid \,\texttt{false}^\bullet \Rightarrow p\_f \\ &\qquad \mid \,\texttt{bool}\_E \, e \Rightarrow p\_e \, e \\ &\qquad \texttt{end} \end{aligned}$$

satisfying the expected reduction rules on all three cases.

In Sect. 6, we illustrate the use of the exceptional theory using the Coq plugin to define a simple cast framework as in [12].

[i] <sup>ε</sup> := λA : [[<sup>i</sup>]]. [[A]] → i [x] <sup>ε</sup> := x<sup>ε</sup> [λx : A. M] <sup>ε</sup> := λ(x : [[A]]) (x<sup>ε</sup> : [[A]]<sup>ε</sup> x). [M] ε [M N] <sup>ε</sup> := [M] <sup>ε</sup> [N] [N] ε [Πx : A. B] <sup>ε</sup> := λ(f : Πx : [[A]]. [[B]]). Π(x : [[A]]) (x<sup>ε</sup> : [[A]]<sup>ε</sup> x). [[B]]<sup>ε</sup> (f x) [[A]]<sup>ε</sup> := [A] ε [[·]]<sup>ε</sup> := · [[Γ, x : A]]<sup>ε</sup> := [[Γ]]ε, x : [[A]], x<sup>ε</sup> : [[A]]<sup>ε</sup> x

Fig. 5. Parametricity over exceptional translation

### 3 Kreisel Meets Martin-Löf

It is well-known that Reynolds' parametricity [13] and Kreisel's modified realizability [4] are two instances of the broader logical relation techniques. Usually, parametricity is used to derive theorems for free, while realizability constrains programs. In a surprising turn of events, we use Bernardy's variant of parametricity on CIC [5] as a realizability trick to evict undesirable behaviours of T<sup>E</sup>. This leads to the *parametric exceptional translation*, which can be seen as the embodiment of Kreisel's realizability in type theory. In this section, we first present this translation on the negative fragment, then extend it to CIC and finally discuss its meta-theoretical properties.

### 3.1 Exceptional Parametricity in a Negative World

The exceptional parametricity translation for terms of CC<sup>ω</sup> is defined in Fig. 5. Intuitively, any type A in T<sup>E</sup> is turned into a validity predicate A<sup>ε</sup> : A → which encodes the fact that an inhabitant of A is not allowed to generate unhandled exceptions. For instance, a function is valid if its application to a valid term produces a valid answer. It does not say anything about the application to invalid terms though, which amounts to a *garbage in, garbage out* policy. The translation then states that every pure term is automatically valid.

This translation is exactly standard parametricity for type theory [5] but parametrized by the exceptional translation. This means that any occurrence of a term of the original theory used in the parametricity translation is replaced by its exceptional translation, using [·] or [[·]] depending on whether it is used as a term or as a type. For instance, the translation of an application [M N] <sup>ε</sup> is given by [M] <sup>ε</sup> [N] [N] <sup>ε</sup> instead of just [M] <sup>ε</sup> N [N] ε.

Lemma 18 (Substitution lemma). *The translation satisfies the following conversion:* [M{x := N}] <sup>ε</sup> ≡ [M] <sup>ε</sup>{x := [N], x<sup>ε</sup> := [N] ε}*.*

Theorem 19 (Soundness). *The two following properties hold.*

*– If* M ≡ N *then* [M] <sup>ε</sup> ≡ [N] ε*. – If* Γ M : A *then* [[Γ]]<sup>ε</sup> [M] <sup>ε</sup> : [[A]]<sup>ε</sup> [M]*.*

*Proof.* By induction on the derivation.

We can use this result to construct another syntactic model of CCω. Contrarily to usual syntactic models where sequents are straightforwarldy translated to sequents, this model is slightly more subtle as sequents are translated to pairs of sequents instead. This is similar to the usual parametricity translation.

Definition 20. *The theory* <sup>T</sup> <sup>p</sup> E*is defined by the following data.*


Once again, Theorem 19 can be rephrased in terms of preservation of theories and syntactic models.

Theorem 21. *If* <sup>T</sup> *interprets* CC<sup>ω</sup> *then so does* <sup>T</sup> <sup>p</sup> E *. That is, the parametric exceptional translation is a syntactic model of* CCω*.*

This construction preserves definitional η-expansion, as functions are mapped to (slightly more complicated) functions.

Lemma 22. *If* <sup>T</sup> *satisfies definitional* <sup>η</sup>*-expansion, then so does* <sup>T</sup> <sup>p</sup> E*.*

*Proof.* The first component of the translation preserves definitional η-expansion because functions are mapped to functions. It remains to show that

$$\left[\lambda x : A.M \ x\right]\_\varepsilon := \lambda (x : \left[A\right]) \left(x\_\varepsilon : \left[A\right]\_\varepsilon \ x\right). \left[M\right]\_\varepsilon \ x \ x\_\varepsilon \equiv \left[M\right]\_\varepsilon$$

which holds by applying η-expansion twice.

It is interesting to remark that Bernardy-style unary parametricity also leads to a syntactic model <sup>T</sup> <sup>p</sup> that interprets CC<sup>ω</sup> (as well as CIC), using the same kind of glueing construction. Nonetheless, this model is somewhat degenerate from the logical point of view. Namely it is a conservative extension of the target theory. Indeed, if Γ <sup>T</sup> *<sup>p</sup>* M : A for some Γ, M and A from T , then there we also have Γ <sup>T</sup> M : A, because the first component of the model is the identity, and the original sequent can be retrieved by the first projection.

This is definitely *not* the case with the <sup>T</sup> <sup>p</sup> E theory, because the first projection is not the identity. In particular, because of Theorem 16, every sequent in the first projection is inhabited, although it is not the case in T itself if it is consistent. This means that parametricity can actually bring additional expressivity when it applies to a theory which is not pure, as it is the case here.

```
Ind boolε : bool• → -
                        :=
Ind listε (A : type) (Aε : [[A]] → -
                                    ) : list• A → -
                                                     :=
            listε A Aε (cons• Axl)
Ind eqε (A : type) (Aε : [[A]] → -
                                  ) (x : [[A]]) (xε : Aε x) :
    Π(y : [[A]]) (yε : Aε y). eq• Axy → -
                                          :=
```
Fig. 6. Examples of parametric translation of inductive types

### 3.2 Exceptional Parametric Translation of **CIC**

We now describe the parametricity translation of the positive fragment. The intuition is that as it stands for an exception, the default constructor is always invalid, while all other constructors are valid, assuming their arguments are.

### Type and Constructor Translation

Definition 23. *Let* I *be an inductive type as given in Definition 6. We define the exceptional parametricity translation* I<sup>ε</sup> *of* I *as the inductive type defined by:*

$$\begin{array}{c} \text{-} parameters \ [p\_1: P\_1, \dots, p\_n: P\_n]\_{\varepsilon};\\ \text{-} indices \ [i\_1: I\_1, \dots, i\_m: I\_m]\_{\varepsilon}, x: \mathcal{I} \ p\_1 \dots p\_n \ i\_1 \dots \ i\_m;\\ \text{-} constraints\\ \text{-} constraints\\ \text{-} \qquad \mathcal{I}\_{\varepsilon} \ p\_1 \ p\_{1\varepsilon} \dots \ n \ p\_{n\varepsilon} \ [V\_{1,1}] \ [V\_{1,1}]\_{\varepsilon} \ \dots \ \text{-} \ [V\_{1,m}] \ [V\_{1,m}]\_{\varepsilon} \ (c\_1^{\bullet} \ \vec{p} \ \vec{a}\_1) \\ \text{-} \dots \\ \text{-} \qquad \mathcal{I}\_{\varepsilon} \ p\_1 \ p\_{1\varepsilon} \dots \mathcal{A}\_{k} \ p\_{n\varepsilon} \ [V\_{k,1}]\_{\varepsilon} \ [V\_{k,1}]\_{\varepsilon} \ \dots \ \text{-} \ [V\_{k,m}]\_{\varepsilon} \ [V\_{k,m}]\_{\varepsilon} \ (c\_k^{\bullet} \ \vec{p} \ \vec{a}\_k). \end{array}$$

*and we extend the translation as*

[I] <sup>ε</sup> := I<sup>ε</sup> [c1] <sup>ε</sup> := c1<sup>ε</sup> *...* [ck] <sup>ε</sup> := ckε.

*Example 24.* We give the exceptional parametric inductive translation of our running examples in Fig. 6.

Note that contrarily to the negative case, the exceptional parametricity translation on inductive types is *not* the same thing as the composition of Bernardy's parametricity together with the exceptional translation. Indeed, the latter would also have produced a constructor for the default case from the exceptional inductive translation, whereas our goal is precisely to rule this case out via the additional realizability-like interpretation.

It is also very different from our previous parametric weaning translation [2], which relies on internal parametricity to recover dependent elimination, enforcing by construction that no effectful term exists. Here, effectful terms may be used in the first component, but they are required after the fact to have no inconsistent behaviour. Intuitively, parametric weaning produces one pure sequent, while exceptional parametricity produces two, with the first one being potentially impure and the second one assuring the first one is harmless.

### Pattern-Matching Translation

Definition 25. *Let* Q *be the pattern-matching defined in Definition 12. We pose* [Q] <sup>ε</sup> *to be the pattern-matching*

match [M] <sup>ε</sup> return <sup>λ</sup>[[i : <sup>I</sup> ]]<sup>ε</sup> (<sup>x</sup> : <sup>I</sup>*•* [X1] *...* [Xn] <sup>i</sup><sup>1</sup> *...* <sup>i</sup>m). (x<sup>ε</sup> : I<sup>ε</sup> [X1] [X1] <sup>ε</sup> *...* [Xn] [Xn] <sup>ε</sup> i<sup>1</sup> i1<sup>ε</sup> *...* i<sup>m</sup> imε x) [[R]]<sup>ε</sup> [Qx] with | c1<sup>ε</sup> a1,<sup>1</sup> a1,1<sup>ε</sup> *...* a1,l<sup>1</sup> a1,l1<sup>ε</sup> ⇒ [N1] ε *...* | ckε ak,<sup>1</sup> ak,1<sup>ε</sup> *...* ak,l*<sup>k</sup>* a1,l*k*<sup>ε</sup> ⇒ [Nk] ε end

*where* Q<sup>x</sup> *is the following pattern-matching*

match x return λ(i<sup>1</sup> : I1) *...* (i<sup>m</sup> : Im) (x : I X<sup>1</sup> *...* X<sup>n</sup> i<sup>1</sup> *...* im). R with | c<sup>1</sup> a1,<sup>1</sup> *...* a1,l<sup>1</sup> ⇒ N<sup>1</sup> *...* | c<sup>k</sup> ak,<sup>1</sup> *...* ak,l*<sup>k</sup>* ⇒ N<sup>k</sup> end

*that is* Q *where the scrutinee has been turned into the index variable of the parametricity predicate.*

Lemma 26. *With notations and typing assumptions from Definition 12, we have*

$$\left[\Gamma\right]\_\varepsilon \vdash \left[Q\right]\_\varepsilon : \left[R\{\vec{i} := \vec{Y}, x := M\}\right]\_\varepsilon \left[Q\right].$$

The exceptional parametricity translation can be extended to handle fixpoints as well, with a few limitations. Translating generic fixpoints uniformly is indeed an open problem in standard parametricity, and our variant faces the same issue. In practice, standard recursors can be automatically translated, and fancy fixpoints may require hand-writing the parametricity proof. We do not describe the recursor translation here though, as it is essentially the same as standard parametricity. Again, the interested reader may test the Coq plugin exposed in Sect. 6 to see how recursors are translated.

Packing everything together allows to state the following result.

Theorem 27. *If* <sup>T</sup> *interprets* CIC*, then so does* <sup>T</sup> <sup>p</sup> E *, and thus the exceptional parametricity translation is a syntactic model of* CIC*.*

#### 3.3 Meta-Theoretical Properties of *<sup>T</sup> <sup>p</sup>* E

Being built as a syntactic model, <sup>T</sup> <sup>p</sup> E inherits a lot of meta-theoretical properties of T . We list a few of interest below.

Theorem 28. *If* <sup>T</sup> *is consistent, then so is* <sup>T</sup> <sup>p</sup> E*.*

*Proof.* Assume <sup>T</sup> *<sup>p</sup>* <sup>E</sup> <sup>M</sup><sup>0</sup> : empty for some <sup>M</sup>0. Then by definition, there exists two terms <sup>M</sup> and <sup>M</sup><sup>ε</sup> such that <sup>T</sup> <sup>M</sup> : empty• and <sup>T</sup> <sup>M</sup><sup>ε</sup> : empty<sup>ε</sup> <sup>M</sup>. But empty<sup>ε</sup> has no constructor, and <sup>T</sup> is inconsistent.

More generally, the same argument holds for any inductive type.

Theorem 29. *If* <sup>T</sup> *enjoys canonicity, then so does* <sup>T</sup> <sup>p</sup> E*.*

*Proof.* The exceptional parametricity translation for inductive types has the same structure as the original type, so any normal form in <sup>T</sup> <sup>p</sup> E can be mapped back to a normal form in T .

### 4 Effectively Extending **CIC**

The parametric exceptional translation allows to extend the logical expressivity of CIC in the following ways, which we develop in the remainder of this section.

We show in Sect. 4.1 that Markov's rule is admissible in CIC. We already sketched this result in our previous paper [2], but we come back to it in more details. More generally, we show a form of conservativity of double-negation elimination over the type-theoretic version of Π<sup>0</sup> <sup>2</sup> formulae.

In Sect. 4.2, we exhibit a syntactic model of CIC which satisfies definitional η-expansion for functions but which negates function extensionality. As far as we know, this was not known.

Finally, in Sect. 4.3, we show that there exists a model of CIC which validates the independence of premises. This is a new result, that shows that CIC can feature traces of classical reasoning while staying computational. We use this result in Sect. 4.4 to give an alternative proof of the recent result of Coquand and Mannaa [7] that Markov's principle is not provable in CIC.

#### 4.1 Markov's Rule

We show in this section that CIC is closed under a generalized Markov's rule. The technique used here is no more than a dependently-typed variant of Friedman's trick [14]. Indeed, Friedman's A-translation amounts to add exceptions to intuitionistic logic, which is precisely what T<sup>E</sup> does for CIC.

Definition 30. *An inductive type in* CIC *is said to be first-order if all the types of the arguments of its constructors, in its parameters and in its indices are recursively first-order.*

*Example 31.* The empty, unit and <sup>N</sup> types are first-order. If <sup>P</sup> and <sup>Q</sup> are firstorder then so is <sup>Σ</sup><sup>p</sup> : P. Q, <sup>P</sup> <sup>+</sup> <sup>Q</sup> and eq P p<sup>0</sup> <sup>p</sup>1. Consequently, the CIC equivalent of Σ<sup>0</sup> <sup>1</sup> formulae are in particular first-order.

First-order types enjoy uncommon properties, like the fact that they can be injected into effectful terms and purified away. This is then used to prove the generalized Markov's Rule.

Lemma 32. *For every first-order type* p : P <sup>Q</sup> : *where all* P *are first-order, there are retractions* ιP *,* ι<sup>Q</sup> *and* θP *,* θ<sup>Q</sup> *s.t.:*

$$\begin{array}{l}\vec{p} \colon \vec{P} \vdash \iota\_{Q} : Q \to \llbracket Q\rrbracket\\\vec{p} \colon \vec{P} \vdash \theta\_{Q} : \llbracket Q\rrbracket\\\vec{p} \colon \vec{P} \vdash \theta\_{Q} : \llbracket Q\rrbracket\\\end{array} \begin{array}{l}\vec{p} \coloneqq \iota\_{\vec{P}} \; \vec{p}\} \end{array}$$

*Proof.* The ι terms exist because effectful inductive types are a semantical superset of their pure equivalent, and the θ terms are implemented by recursively forcing the corresponding impure inductive term. One relies on decidability of equality of first-order type to fix the indices.

Theorem 33 (Generalized Markov's Rule). *For any first-order type* P *and first-order predicate* Q *over* P*, if* CIC Πp : P.¬¬ (Q p) *then* CIC Πp : P. Q p*.*

*Proof.* Let <sup>M</sup> : Π<sup>p</sup> : P.¬¬ (Q p). By taking <sup>E</sup> := Q p and apply the soundness theorem, one gets a proof

$$p: P \vdash [M]: \Pi\\\hat{p}: [P]. ([Q \; \hat{p}] \rightarrow \texttt{empty} \, ^\bullet) \rightarrow \texttt{empty} \, ^\bullet.$$

But empty• <sup>∼</sup><sup>=</sup> <sup>E</sup> <sup>≡</sup> Q p, so we can derive from [M] a term <sup>M</sup> s.t.

$$p: P \vdash M^\sharp : \Pi \hat{p}: [P]. ([Q \; \hat{p}] \to Q \; p+Q \; p) \to Q \; p.$$

The proofterm we were looking for is thus no more than λp : P.M (ι<sup>P</sup> p) θQ.

### 4.2 Function Intensionality with *η*-expansion

In a previous paper [9], we already showed that there existed a syntactic model of CIC that allowed to internally disprove function extensionality. Yet, this model was clearly not preserving definitional η-expansion on functions, as it was adding additional structure to abstraction and application (namely a boolean). Thanks to our new model, we can now demonstrate that counterintuitively, it is possible to have a consistent type theory that enjoys definitional η-expansion while negating internally function extensionality. In this section we suppose that <sup>E</sup> := unit, although any inhabited type of exceptions would work.

By Lemma 22, we know that the parametric exceptional translation preserves definitional η-expansion. It is thus sufficient to find two functions that are extensionally equal but intensionally distinct in the model. Let us consider to this end the unit <sup>→</sup> unit functions

$$\mathbf{id}\_{\perp} := \lambda u : \mathbf{unit}. \, u \qquad \qquad \mathbf{id}\_{\top} := \lambda u : \mathbf{unit}. \, \mathbf{tt}.$$

Theorem 34. *The following sequents are derivable:*

<sup>T</sup> *<sup>p</sup>* <sup>E</sup> <sup>Π</sup><sup>u</sup> : unit. id<sup>⊥</sup> <sup>u</sup> <sup>=</sup> id <sup>u</sup> <sup>T</sup> *<sup>p</sup>* <sup>E</sup> id<sup>⊥</sup> <sup>=</sup> id <sup>→</sup> empty*.*

*Proof.* The main difference between the two functions is that id<sup>⊥</sup> preserves exceptions while id does not, which we exploit.

The first sequent is provable in CIC by dependent elimination and thus is derivable in <sup>T</sup> <sup>p</sup> Eby applying the soundness theorem.

To prove the first component of the second sequent, we exhibit a property that discriminates [id<sup>⊥</sup>] and [id], which is, as explained, their evaluation on the term unit<sup>∅</sup> tt. Showing then that this proof is parametric is equivalent to showing Π(<sup>p</sup> : [[id<sup>⊥</sup> <sup>=</sup> id]]) (p<sup>ε</sup> : [[id<sup>⊥</sup> <sup>=</sup> id]]<sup>ε</sup> <sup>p</sup>). empty. But <sup>p</sup><sup>ε</sup> actually implies [id<sup>⊥</sup>]=[id], which we just showed was absurd.

### 4.3 Independence of Premise

Independence of premise (IP) is a semi-classical principle from first-order logic whose CIC equivalent can be stated as follows.

$$\Pi(A:\square) \left( B:\mathbb{N} \to \square \right) . \left( \neg A \to \Sigma n: \mathbb{N}. B \text{ } n \right) \to \Sigma n: \mathbb{N}. \neg A \to B \text{ } n \quad \text{(IP)}$$

Although not derivable in intuitionistic logic, it is an admissible rule of **HA**. The standard proof of this property is to go through Kreisel's modified realizability interpretation of **HA** [4]. In a nutshell, the interpretation goes as follows: by induction over a formula A, define a simple type τ (A) of realizers of A together with a realizability predicate · A over τ (A). Then show that whenever **HA** A, there exists some simply-typed term t : τ (A) s.t. t A. As the interpretation also implies that there is no t s.t. t ⊥, this gives a sound model of **HA**, which contains more than the latter. Most notably, there is for instance a term ip s.t.

$$\mathbf{ip} \Vdash (\neg A \to \exists n. B) \to \exists n. \neg A \to B$$

for any A, B. Intriguingly, the computational content of ip did not seem to receive a fair treatment in the literature. To the best of our knowledge, it has never been explicitly stated that IP was realizable because of the following "bug" of Kreisel's modified realizability.

Lemma 35 (Kreisel's bug). *For every formula* A*,* τ (A) *is inhabited. In particular,* <sup>τ</sup> (⊥) := unit*.*

We show that this is actually not a bug, but a hidden feature of Kreisel's modified realizability, which secretly allows to encode exceptions in the realizers. To this end, we implement IP in <sup>T</sup> <sup>p</sup> E by relying internally on *paraproofs*, i.e. terms raising exceptions, while ensuring these exceptions never escape outside of the locally unsafe boundary. The resulting <sup>T</sup> <sup>p</sup> E term has essentially the same computational content as its Kreisel's realizability counterpart. In this section we suppose that <sup>E</sup> := unit, although assuming <sup>E</sup> to be inhabited is sufficient.

To ease the understanding of the definition, we rely on effectful combinators that can be defined in T<sup>E</sup>.

Definition 36. *We define in* T<sup>E</sup> *the following terms.*

fail : ΠA : -. A [fail] := λA : [[-]]. [A] ∅ tt is<sup>Σ</sup> : ΠAB. (Σx : A. B) → bool [isΣ] := λA B p. match p with | ex*• \_ \_* ⇒ true*•* | Σ<sup>∅</sup> *\_* ⇒ false*•* end is<sup>N</sup> : <sup>N</sup> <sup>→</sup> bool [isN] := fix is<sup>N</sup> n := match n with | O*•* ⇒ true*•* | S*•* n ⇒ is<sup>N</sup> n <sup>|</sup> <sup>N</sup><sup>∅</sup> *\_* <sup>⇒</sup> false*•* end

It is worth insisting that these combinators are not necessarily parametric. While it can be shown that is<sup>Σ</sup> and is<sup>N</sup> actually are, fail is luckily not. The is<sup>Σ</sup> and is<sup>N</sup> functions are used in order to check that a value is actually pure and does not contain exceptions.

Definition 37. *We define* ip *in* <sup>T</sup><sup>E</sup> *in direct style below, using the available combinators from Definition 36 and a bit of syntactic sugar.*

> ip : IP ip := λ(A : -) (<sup>B</sup> : <sup>N</sup> <sup>→</sup> -) (<sup>f</sup> : <sup>¬</sup><sup>A</sup> <sup>→</sup> <sup>Σ</sup><sup>n</sup> : <sup>N</sup>.B n). let p := f (fail (¬A)) in if is<sup>Σ</sup> N B p then match p with | ex n b ⇒ if is<sup>N</sup> n then ex *\_ \_* n (λ*\_* : ¬A. b) else ex *\_ \_* O (fail (¬A → B O)) end else ex *\_ \_* O (fail (¬A → B O))

The intuition behind this term is the following. Given <sup>f</sup> : <sup>¬</sup><sup>A</sup> <sup>→</sup> <sup>Σ</sup><sup>n</sup> : <sup>N</sup>.B n, we apply it to a dummy function which fails whenever it is used. Owing to the semantics of negation, we know *in the parametricity layer* that the only way for this application to return an exception is that f actually contained a proof of <sup>A</sup> and applied fail to it. Therefore, given a true proof of <sup>¬</sup>A, we are in an inconsistent setting and thus we are able to do whatever pleases us. The issue is that we do not have access to such a proof yet, and we do have to provide a valid integer now. Therefore, we check whether f actually provided us with a valid pair containing a valid integer. If so, this is our answer, otherwise we stuff a dummy integer value and we postpone the contradiction.

This is essentially the same realizer as the one from Kreisel's modified realizability, except that we have a fancy type system for realizers. In particular, because we have dependent types, integers also exist in the logical layer, so that they need to be checked for exceptions as well. The only thing that remains to be proved is that ip also lives in <sup>T</sup> <sup>p</sup> E.

Theorem 38. *There is a proof of* <sup>T</sup> [[IP]]<sup>ε</sup> [ip]*.*

*Proof.* The proof is straightforward but tedious, so we do not give the full details. The file IPc.v of the companion Coq plugin contains an explicit proof. The essential properties that make it go through are the following.


Corollary 39. *We have* <sup>T</sup> *<sup>p</sup>* <sup>E</sup> IP*.*

### 4.4 Non-provability of Markov's Principle

From this result, one can get a very easy syntactic proof of the independence result of Markov's principle from CIC. Markov's principle is usually stated as

<sup>Π</sup><sup>P</sup> : <sup>N</sup> <sup>→</sup> bool.¬¬ (Σ<sup>n</sup> : <sup>N</sup>.P n <sup>=</sup> true) <sup>→</sup> <sup>Σ</sup><sup>n</sup> : <sup>N</sup>.P n <sup>=</sup> true (MP)

An independence result was recently proved by Coquand and Mannaa by a semantic argument [7]. We leverage instead a property from realizability [15] that has been applied to type theory the other way around by Herbelin [16].

Lemma 40. *If* S *is a computable theory containing* CIC *and enjoying canonicity, then one cannot have both* <sup>S</sup> IP *and* <sup>S</sup> MP*.*

*Proof.* By applying IP to MP, one easily obtains that

<sup>S</sup> <sup>Π</sup><sup>P</sup> : <sup>N</sup> <sup>→</sup> bool. <sup>Σ</sup><sup>n</sup> : <sup>N</sup>. <sup>Π</sup><sup>m</sup> : <sup>N</sup>.P m <sup>=</sup> true <sup>→</sup> P n <sup>=</sup> true.

Thus, for every closed <sup>P</sup> : <sup>N</sup> <sup>→</sup> bool, by canonicity there exists a closed <sup>n</sup><sup>P</sup> : <sup>N</sup> s.t. <sup>S</sup> <sup>Π</sup><sup>m</sup> : <sup>N</sup>.P m <sup>=</sup> true <sup>→</sup> P n<sup>P</sup> <sup>=</sup> true. But then one can decide whether P holds for some n by just computing P n<sup>P</sup> , so that we effectively obtained an oracle deciding the halting problem (which is expressible in CIC).

Corollary 41. *We have* CIC*<sup>p</sup>* <sup>E</sup> MP *and thus also* CIC MP*.*

### 5 Possible Extensions

### 5.1 Negative Records

Interestingly, the fact that the translation introduces effects has unintented consequences on a few properties of type theory that are often taken for granted. Namely, because type theory is pure, there is a widespread confusion amongst type theorists between positive tuples and negative records.


$$|A,B,M,N\rangle ::= \dots \mid \& x:A.B \mid \langle M,N \rangle \mid M.\pi\_1 \mid M.\pi\_2 \rangle$$

$$\begin{array}{c} \Gamma \vdash A : \Box\_{l} \quad \Gamma, x : A \vdash B : \Box\_{j} \\ \hline \Gamma \vdash \& x : A . B : \Box\_{\max\{i,j\}} \end{array} \quad \begin{array}{c} \Gamma \vdash M : \& x : A . B \\ \hline \Gamma \vdash M.\pi\_{1} : A \end{array} \quad \begin{array}{c} \Gamma \vdash M : \& x : A . B \\ \hline \Gamma \vdash M.\pi\_{2} : B\{x : = M.\pi\_{1}\} \\ \hline \Gamma \vdash N : B\{x : = M\} \end{array}$$

$$\begin{array}{c} \Gamma \vdash \!\!/ M : A \qquad \Gamma, x : A \vdash B : \Box \qquad \Gamma \vdash N : B\{x : = M\} \\ \hline \Gamma \vdash \langle M, N \rangle : \& x : A . B \end{array}$$

## Fig. 7. Negative pairs

M.π1, M.π2 ≡ M M,N.π<sup>1</sup> ≡ M M,N.π<sup>2</sup> ≡ N

[&<sup>x</sup> : A. B] := TypeVal (&<sup>x</sup> : [[A]]. [[B]]) (λe : <sup>E</sup>. [A] ∅ e, [B] <sup>∅</sup>{x := [A] ∅ e} e) [M,N] := [M], [N] [M.πi] := [M].π<sup>i</sup>

Fig. 8. Exceptional translation of negative pairs

In the remainder of this section, we will focus on the specific case of pairs, but the same arguments are generalizable to arbitrary records. Positive pairs Σx : A. B are defined by the inductive type from Fig. 4. Negative pairs &x : A. B are defined as a primitive structure in Fig. 7. We use the ampersand notation as a reference to linear logic.

In CIC, it is possible to show that negative and positive pairs are propositionally isomorphic, because positive pairs enjoy dependent elimination. Nonetheless, it is a well-known fact in the programming folklore that in a call-by-name language with effects, the two are sharply distinct. For instance, in presence of exceptions, assuming M : Σx : A. B, one does not have in general

$$M \equiv \mathtt{ex}\begin{pmatrix} A \ B \ (\mathtt{fst}\ A \ B\ M) \end{pmatrix} \begin{pmatrix} \mathtt{snd}\ A \ B \ M \end{pmatrix}$$

where fst and snd are defined by pattern-matching. Indeed, if <sup>M</sup> is itself an exception, the two sides can be discriminated by a pattern-matching. Matching on the left-hand side results in immediate reraising of the exception, while matching on the right-hand side succeeds as long as the arguments of the constructor are not forced. Forcefully equating those two terms would then result in a trivial equational theory.

Such a phenomenon is at work in the exceptional translation. It is actually possible to interpret negative pairs through the translation, but in a way that significantly differs from the translation of positive pairs. In this section, we assume that T contains negative pairs.

Definition 42. *The translation of negative pairs is given in Fig. 8.*

It is straightforward to check that the definitions of Fig. 8 preserve the conversion and typing rules from Fig. 7. The same translation can be extended to any record. We thus have the following theorem.

### Theorem 43. *If* T *has negative records, then so has* T<sup>E</sup>*.*

It is enlightening to look at the difference between negative and positive pairs through the translation, because now we have effects that allow to separate them clearly. Indeed, compare

[[&<sup>x</sup> : A. B]] <sup>≡</sup> &<sup>x</sup> : [[A]]. [[B]] with [[Σ<sup>x</sup> : A. B]] <sup>∼</sup><sup>=</sup> <sup>E</sup> + Σ<sup>x</sup> : [[A]]. [[B]].

Clearly, if E is inhabited, then the two types do not even have the same cardinal, assuming A and B are finite. Furthermore, their default inhabitant is not the same at all. It is defined pointwise for negative pairs, while it is a special constructor for positive ones. Finally, there is obviously not any chance that [[Σx : A. B]] satisfies definitional surjective pairing in vanilla CIC, as it has two constructors. The trick is that the two types are externally distinguishable, but are not internally so, because T<sup>E</sup> is a model of CIC+& and thus proves that they are propositionally isomorphic.

It is possible to equip negative pairs with a parametricity relation defined as a primitive record which is the pointwise parametricity relation of each field, which naturally preserve typing and conversion rules.

Theorem 44. *If* <sup>T</sup> *has negative records, then so has* <sup>T</sup> <sup>p</sup> E*.*

#### 5.2 Impredicative Universe

All the systems we have considered so far are predicative. It is nonetheless possible to implement an impredicative universe ∗ in T<sup>E</sup> if T features one.

Intuitively, it is sufficient to ask for an inductive type prop living in i for all <sup>i</sup>, which is defined just as type, except that its constructor PropVal corresponding to TypeVal contains elements of <sup>∗</sup> rather than -. Then one can similarly define El<sup>∗</sup> and Err<sup>∗</sup> acting on prop rather than type. One then slightly tweaks the [[·]] macro from Fig. 2 by defining it instead as

$$\begin{aligned} \left[A\right] := \begin{cases} \mathbf{E1}\_\* \left[A\right] \text{ if } A:\*\\ \mathbf{E1} \left[A\right] \text{ otherwise.} \end{cases} \end{aligned} $$

and similarly for type constructors. With this modified translation, one obtains a soundness theorem for CCω.

Theorem 45. *The exceptional translation is a syntactic model of* CC<sup>ω</sup> + ∗*.*

Likewise, the inductive translation is amenable to interpret an impredicative universe, with one major restriction though.

Theorem 46. *The exceptional translation is a syntactic model of* CIC+∗ *without the singleton elimination rule.*

Indeed, the addition of the default constructor disrupts the singleton elimination criterion for all inductive types. Actually, this criterion is very fragile, and even if T<sup>E</sup> satisfied it, Keller and Lasson showed that the parametricity translation could not interpret inductive types in <sup>∗</sup> for similar reasons [17], and <sup>T</sup> <sup>p</sup> E would face the same issue.

### 6 The Exceptional Translation in Practice

### 6.1 Implementation as a Coq Plugin

The (parametric) exceptional translation is a translation of CIC into itself, which means that we can directly implement it as a Coq plugin. This way, we can use the translation to extend safely Coq with new logical principles, so that typechecking remains decidable.

Such a Coq plugin is simply a program that, given a Coq proof term M, produces the translations [M] and [M] <sup>ε</sup> as Coq terms. For instance, the translations of type list, given in Figs. 4 and 6, are obtained by typing the following commands, which define each one new inductive type in Coq.

```
Effect Translate list.
Parametricity Translate list.
```
The first command produces only [list], while the second produces [list] <sup>ε</sup>. But the main interest of the translation is that we can exhibit new constructors. For instance, the raise operation described in Sect. 2.4 is defined as

```
Effect Definition Exception : Type := fun E ⇒ TypeVal E E id.
Effect Definition raise : ∀ A, Exception → A := fun E (A : type E) ⇒ Err A.
```
### 6.2 Usecase: A Cast Framework

We can use the ability to raise exception to define partial function in the exceptional layer. For instance, given a decidable property (described by the type class below), it is then possible to define a cast function from A to Σ (a : A). P a returning the converted value if the property is satisfied and raising an exception otherwise (using an inhabitant cast\_failed of Exception).

```
Class Decidable (A : Type) := dec : A + (not A).
Definition cast A (P : A → Type) (a:A) {Hdec : Decidable (P a)} : Σ (a : A). P a
:= match dec (P a) with
   | inl p ⇒ (a ; p)
   | inr _ ⇒ raise cast_failed
   end.
```
Using this cast mechanism, it is easy to define a function list\_to\_pair from lists to pairs by first converting the list into a list size two, using the impure function cast (list A) (fun l ⇒ List.length l = 2) and then recovering a pair from a list of size two using a pure function.

In the exceptional layer, it is possible to prove the following property

```
Definition list_to_pair_prop A (x y : A) : list_to_pair [x ; y]=(x,y).
```
in at least two way. One can perfectly prove it by simply raising an exception at top level, or by reflexivity—using the fact that list\_to\_pair [x ; y] actually reduces to (x,y).

However, there is a way to distinguish between those two proofs in the target theory, here Coq, by stating the following lemma which can only proven for the proof not raising an exception.

Definition list\_to\_pair\_prop\_soundnessAxy : list\_to\_pair\_prop*•* Axy = eq\_refl*•* \_\_\_ := eq\_refl \_.

where underscores represent arguments inferred by Coq.

### 7 Related Work

*Adding Dependency to an Effectful Language.* There are numerous works on adding dependent types in mainstream effectful programming languages. They all mostly focused on how to appropriately restrict effectful terms from appearing in types. Indeed, if types only depend on pure terms, the problem of having two different evaluations of the effect of the term (at the level of types and at the level of terms) disappear. This is the case for instance for Dependent ML of Xi and Pfenning [18], or more recently for Casinghino *et al.* [19] on how to combine proofs and programs when programs can be non-terminating. The F programming language of Swamy *et al.* [20] uses a notion of primitive effects including state, exceptions, divergence and IO. Each effect is described through a monadic predicate transformer semantics which allows to have a pure core dependent language to reason on those effects. On a more foundational side, there are two recent and overlapping lines of work on the description of a dependent call-by-push-value (CBPV) by Ahman *et al.* [21] and Vákár [22]. Those works also use a purity restriction for dependency, but using the CBPV language, deals with any effect described in monadic style. On another line of work, Brady advocates for the use of algebraic effects as an elegant way to allow combing effects more smoothly than with a monadic approach and gives an implementation in Idris [23].

*Adding Effects to a Dependently-Typed Language.* Nanevski *et al.* [24] have developed Hoare type theory (HTT) to extend Coq with monadic style effects. To this end, they provide an axiomatic extension of Coq with a monad in which to encapsulate imperative code. Important tools have been developed on HTT, most notably the Ynot project [25]. Apart from being axiomatic, their monadic approach does not allow to mix effectful programs and dependency but is rather made for proving inside Coq properties on simply typed imperative programs.

*Internal Translation of Type Theory.* A non-axiomatic way to extend type theory with new features is to use internal translation, that is translation of type theory into itself as advocated by Boulier *et al.* [9]. The presentation of parametricity for type theory given by Bernardy and Lasson [5] can be seen as one of the first internal translation of type theory. However, this one does not add any new power to type theory as it is a conservative extension. Barthe *et al.* [26] have described a CPS translation for CC<sup>ω</sup> featuring call-cc, but without dealing with inductive types and relying on a form of type stratification. A variant of this translation has been extended recently by Bowman *et al.* [27] to dependent sums using answer-type polymorphism Πα : -.(A → α) → α. A generic class of internal translations has been defined by Jaber *et al.* [28] using forcing, which can be seen as a type theoretic version of the presheaf construction used in categorical logic. This class of translation works on all CIC but for a restricted version of dependent elimination, identical to the Baclofen type theory [2]. Therefore, to the best of our knowledge, the exceptional translation is the first complete internal translation of CIC adding a particular notion of effect.

### 8 Conclusion and Future Work

In this paper, we have defined the exceptional translation, the first syntactic translation of the Calculus of Inductive Constructions into itself, adding effects and that covers full dependent elimination. This results in a new type theory, which features call-by-name exceptions with decidable type-checking and a weaker form of canonicity. We have shown that although the resulting theory is inconsistent, it is possible to reason on exceptional programs and show that some of them actually never raise an exception by relying on the target theory. This provides a sound logical framework allowing to transparently prove safety properties about impure dependently-typed programs. Then, using parametricity, we have given an additional layer at the top of the exceptional translation in order to tame exceptions and preserve consistency. This way, we have consistently extended the logical expressivity of CIC with independence of premises, Markov's rule, and the negation of function extensionality while retaining ηexpansion. Both translations have been implemented in a Coq plugin, which we use to formalize the examples.

One of the main directions of future work is to investigate whether other kind of effects can give rise to an internal translation of CIC. To that end, it seems promising to look at algebraic presentation of effects. Indeed, the recent work on the non-necessity of the value restriction policy for algebraic effects and handlers of Kammar and Pretnar [29] suggests that we should be able to perform similar translations on CIC with full dependent elimination for other algebraic effects and handlers than exceptions.

Acknowledgements. This research was supported in part by an ERC Consolidator Grant for the project "RustBelt", funded under Horizon 2020 grant agreement № 683289 and an ERC Starting Grant for the project "CoqHoTT", funded under Horizon 2020 grant agreement № 637339.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Let Arguments Go First**

Ningning Xie(B) and Bruno C. d. S. Oliveira

The University of Hong Kong, Pokfulam, Hong Kong {nnxie,bruno}@cs.hku.hk

**Abstract.** Bi-directional type checking has proved to be an extremely useful and versatile tool for type checking and type inference. The conventional presentation of bi-directional type checking consists of two modes: *inference* mode and *checked* mode. In traditional bi-directional type-checking, type annotations are used to guide (via the checked mode) the type inference/checking procedure to determine the type of an expression, and *type information flows from functions to arguments*.

This paper presents a variant of bi-directional type checking where the *type information flows from arguments to functions*. This variant retains the inference mode, but adds a so-called *application* mode. Such design can remove annotations that basic bi-directional type checking cannot, and is useful when type information from arguments is required to type-check the functions being applied. We present two applications and develop the meta-theory (mostly verified in Coq) of the application mode.

### **1 Introduction**

Bi-directional type checking has been known in the folklore of type systems for a long time. It was popularized by Pierce and Turner's work on *local type inference* [29]. Local type inference was introduced as an alternative to Hindley-Milner (henceforth HM system) type systems [11,17], which could easily deal with polymorphic languages with subtyping. Bi-directional type checking is one component of local type inference that, aided by some type annotations, enables type inference in an expressive language with polymorphism and subtyping. Since Pierce and Turner's work, various other authors have proved the effectiveness of bi-directional type checking in several other settings, including many different systems with subtyping [12,14,15], systems with dependent types [2,3,10,21,37], and various other works [1,7,13,22,28]. Furthermore, bi-directional type checking has also been combined with HM-style techniques for providing type inference in the presence of higher-ranked types [14,27].

The key idea in bi-directional type checking is simple. In its basic form typing is split into *inference* and *checked* modes. The most salient feature of a bi-directional type-checker is when information deduced from inference mode is used to guide checking of an expression in checked mode. One of such interactions between modes happens in the typing rule for function applications:

$$\frac{\Gamma \vdash e\_1 \implies A \to B \qquad \Gamma \vdash e\_2 \iff A}{\Gamma \vdash e\_1 \; e\_2 \implies B} \; \text{APP}$$

c The Author(s) 2018 A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 272–299, 2018. https://doi.org/10.1007/978-3-319-89884-1\_10

In the above rule, which is a standard bi-directional rule for checking applications, the two modes are used. First we synthesize (⇒) the type A → B from e1, and then check (⇐) e<sup>2</sup> against A, returning B as the type for the application.

This paper presents a variant of bi-directional type checking that employs a so-called *application* mode. With the application mode the design of the application rule (for a simply typed calculus) is as follows:

$$\frac{\Gamma \vdash e\_2 \implies A \quad \quad \Gamma \vdash \Psi, A \vdash e\_1 \implies A \to B}{\Gamma \lor \Psi \vdash e\_1 \; e\_2 \implies B} \; \text{APP}$$

In this rule, there are two kinds of judgments. The first judgment is just the usual inference mode, which is used to infer the type of the argument e2. The second judgment, the application mode, is similar to the inference mode, but it has an additional context Ψ. The context Ψ is a stack that tracks the types of the arguments of outer applications. In the rule for application, the type of the argument e<sup>2</sup> is inferred first, and then pushed into Ψ for inferring the type of e1. Applications are themselves in the application mode, since they can be in the context of an outer application. With the application mode it is possible to infer the type for expressions such as (λx. x) 1 without additional annotations.

Bi-directional type checking with an application mode may still require type annotations and it gives different trade-offs with respect to the checked mode in terms of type annotations. However the different trade-offs open paths to different designs of type checking/inference algorithms. To illustrate the utility of the application mode, we present two different calculi as applications. The first calculus is a higher ranked implicit polymorphic type system, which infers higher-ranked types, generalizes the HM type system, and has polymorphic **let** as syntactic sugar. As far as we are aware, no previous work enables an HM-style let construct to be expressed as syntactic sugar. For this calculus many results are proved using the Coq proof assistant [9], including type-safety. Moreover a sound and complete algorithmic system, inspired by Peyton Jones et al. [27], is also developed. A second calculus with *explicit polymorphism* illustrates how the application mode is compatible with type applications, and how it adds expressiveness by enabling an encoding of type declarations in a System-F-like calculus. For this calculus, all proofs (including type soundness), are mechanized in Coq.

We believe that, similarly to standard bi-directional type checking, bidirectional type checking with an application mode can be applied to a wide range of type systems. Our work shows two particular and non-trivial applications. Other potential areas of applications are other type systems with subtyping, static overloading, implicit parameters or dependent types.

In summary the contributions of this paper are<sup>1</sup>:


### **2 Overview**

### **2.1 Background: Bi-directional Type Checking**

Traditional type checking rules can be heavyweight on annotations, in the sense that lambda-bound variables always need explicit annotations. Bi-directional type checking [29] provides an alternative, which allows types to propagate downward the syntax tree. For example, in the expression (λf:Int → Int. f) (λy. y), the type of y is provided by the type annotation on f. This is supported by the bi-directional typing rule for applications:

$$\frac{\Gamma \vdash e\_1 \implies A \to B \qquad \Gamma \vdash e\_2 \iff A}{\Gamma \vdash e\_1 \; e\_2 \implies B}\_{\text{APP}}$$

Specifically, if we know that the type of e<sup>1</sup> is a function from A → B, we can check that e<sup>2</sup> has type A. Notice that here the type information flows from functions to arguments.

One guideline for designing bi-directional type checking rules [15] is to distinguish introduction rules from elimination rules. Constructs which correspond to introduction forms are *checked* against a given type, while constructs corresponding to elimination forms *infer* (or synthesize) their types. For instance, under this design principle, the introduction rule for pairs is supposed to be in checked mode, as in the rule Pair-C.

$$\frac{\Gamma \vdash e\_1 \iff A \quad \quad \Gamma \vdash e\_2 \iff B}{\Gamma \vdash (e\_1, e\_2) \quad \quad (A, B)} \quad \frac{\Gamma \vdash e\_1 \implies A \quad \quad \Gamma \vdash e\_2 \implies B}{\Gamma \vdash (e\_1, e\_2) \quad \Rightarrow (A, B)} \quad \text{P} \text{an-I}$$

<sup>1</sup> All supplementary materials are available in https://bitbucket.org/ningningxie/letarguments-go-first.

Unfortunately, this means that the trivial program (1, 2) cannot type-check, which in this case has to be rewritten to (1, 2) : (Int , Int).

In this particular case, bi-directional type checking goes against its original intention of removing burden from programmers, since a seemingly unnecessary annotation is needed. Therefore, in practice, bi-directional type systems do not strictly follow the guideline, and usually have additional inference rules for the introduction form of constructs. For pairs, the corresponding rule is Pair-I.

Now we can type check (1, 2), but the price to pay is that two typing rules for pairs are needed. Worse still, the same criticism applies to other constructs. This shows one drawback of bi-directional type checking: often to minimize annotations, many rules are duplicated for having both inference and checked mode, which scales up with the typing rules in a type system.

### **2.2 Bi-directional Type Checking with the Application Mode**

We propose a variant of bi-directional type checking with a new *application mode*. The application mode preserves the advantage of bi-directional type checking, namely many redundant annotations are removed, while certain programs can type check with even fewer annotations. Also, with our proposal, the inference mode is a special case of the application mode, so it does not produce duplications of rules in the type system. Additionally, the checked mode can still be *easily* combined into the system (see Sect. 5.1 for details). The essential idea of the application mode is to enable the type information flow in applications to propagate from arguments to functions (instead of from functions to arguments as in traditional bi-directional type checking).

To motivate the design of bi-directional type checking with an application mode, consider the simple expression

(λx. x) 1

This expression cannot type check in traditional bi-directional type checking because unannotated abstractions only have a checked mode, so annotations are required. For example, ((λx. x) : Int → Int) 1.

In this example we can observe that if the type of the argument is accounted for in inferring the type of λx. x, then it is actually possible to deduce that the lambda expression has type Int → Int , from the argument 1.

*The Application Mode.* If types flow from the arguments to the function, an alternative idea is to push the type of the arguments into the typing of the function, as the rule that is briefly introduced in Sect. 1:

$$\frac{\begin{array}{c} \Gamma \vdash e\_2 \Rightarrow \ A \end{array} \quad \begin{array}{c} \Gamma \vdash \Psi, A \vdash e\_1 \Rightarrow \ A \to B \end{array}}{\begin{array}{c} \Gamma \vdash e\_1 \ e\_2 \Rightarrow \ B \end{array}} \begin{array}{c} \text{APP} \end{array}$$

Here the argument e<sup>2</sup> synthesizes its type A, which then is pushed into the application context Ψ. Lambda expressions can now make use of the application context, leading to the following rule:

$$\frac{\Gamma, x:A \vdash \Psi \vdash e \implies B}{\Gamma \vdash \Psi, A \vdash \lambda x. \ e \implies A \to B} \text{ }\_{\text{LAM}}$$

The type A that appears last in the application context serves as the type for x, and type checking continues with a smaller application context and x:A in the typing context. Therefore, using the rule App and Lam, the expression (λx. x) 1 can type-check without annotations, since the type Int of the argument 1 is used as the type of the binding x.

Note that, since the examples so far are based on simple types, obviously they can be solved by integrating type inference and relying on techniques like unification or constraint solving. However, here the point is that the application mode helps to reduce the number of annotations *without requiring such sophisticated techniques*. Also, the application mode helps with situations where those techniques cannot be easily applied, such as type systems with subtyping.

*Interpretation of the Application Mode.* As we have seen, the guideline for designing bi-directional type checking [15], based on introduction and elimination rules, is often not enough in practice. This leads to extra introduction rules in the inference mode. The application mode does not distinguish between introduction rules and elimination rules. Instead, to decide whether a rule should be in inference or application mode, we need to think whether the expression can be applied or not. Variables, lambda expressions and applications are all examples of expressions that can be applied, and they should have application mode rules. However pairs or literals cannot be applied and should have inference rules. For example, type checking pairs would simply lead to the rule Pair-I. Nevertheless elimination rules of pairs could have non-empty application contexts (see Sect. 5.2 for details). In the application mode, arguments are always inferred first in applications and propagated through application contexts. An empty application context means that an expression is not being applied to anything, which allows us to model the inference mode as a particular case<sup>2</sup>.

*Partial Type Checking.* The inference mode synthesizes the type of an expression, and the checked mode checks an expression against some type. A natural question is how do these modes compare to application mode. An answer is that, in some sense: the application mode is stronger than inference mode, but weaker than checked mode. Specifically, the inference mode means that we know nothing about the type an expression before hand. The checked mode means that the whole type of the expression is already known before hand. With the application mode we know some partial type information about the type of an expression:

<sup>2</sup> Although the application mode generalizes the inference mode, we refer to them as two different modes. Thus the variant of bi-directional type checking in this paper is interpreted as a type system with both *inference* and *application* modes.

we know some of its argument types (since it must be a function type when the application context is non-empty), but not the return type.

Instead of nothing or all, this partialness gives us a finer grain notion on how much we know about the type of an expression. For example, assume e : A → B → C. In the inference mode, we only have e. In the checked mode, we have both e and A → B → C. In the application mode, we have e, and maybe an empty context (which degenerates into inference mode), or an application context A (we know the type of first argument), or an application context B,A (we know the types of both arguments).

*Trade-offs.* Note that the application mode is *not* conservative over traditional bidirectional type checking due to the different information flow. However, it provides a new design choice for type inference/checking algorithms, especially for those where the information about arguments is useful. Therefore we next discuss some benefits of the application mode for two interesting cases where functions are either variables; or lambda (or type) abstractions.

### **2.3 Benefits of Information Flowing from Arguments to Functions**

*Local Constraint Solver for Function Variables.* Many type systems, including type systems with *implicit polymorphism* and/or *static overloading*, need information about the types of the arguments when type checking function variables. For example, in conventional functional languages with implicit polymorphism, function calls such as (id 3) where id: ∀a. (a → a), are *pervasive*. In such a function call the type system must instantiate a to Int. Dealing with such implicit instantiation gets trickier in systems with *higher-ranked types*. For example, Peyton Jones et al. [27] require additional syntactic forms and relations, whereas Dunfield and Krishnaswami [14] add a special purpose *application judgment*.

With the application mode, all the type information about the arguments being applied is available in application contexts and can be used to solve instantiation constraints. To exploit such information, the type system employs a special subtyping judgment called *application subtyping*, with the form Ψ - A ≤ B. Unlike conventional subtyping, computationally Ψ and A are interpreted as inputs and B as output. In above example, we have that Int - ∀a.a → a ≤ B and we can determine that a = Int and B = Int → Int. In this way, type system is able to solve the constraints *locally* according to the application contexts since we no longer need to propagate the instantiation constraints to the typing process.

*Declaration Desugaring for Lambda Abstractions.* An interesting consequence of the usage of an application mode is that it enables the following **let** sugar:

**let** x = e<sup>1</sup> **in** e<sup>2</sup> -(λx. e2) e<sup>1</sup>

Such syntactic sugar for **let** is, of course, standard. However, in the context of implementations of typed languages it normally requires extra type annotations or a more sophisticated type-directed translation. Type checking (λx. e2) e<sup>1</sup>

would normally require annotations (for example an annotation for x), or otherwise such annotation should be inferred first. Nevertheless, with the application mode no extra annotations/inference is required, since from the type of the argument e<sup>1</sup> it is possible to deduce the type of x. Generally speaking, with the application mode *annotations are never needed for applied lambdas*. Thus **let** can be the usual sugar from the untyped lambda calculus, including HM-style **let** expression and even type declarations.

### **2.4 Application 1: Type Inference of Higher-Ranked Types**

As a first illustration of the utility of the application mode, we present a calculus with *implicit predicative higher-ranked polymorphism*.

*Higher-Ranked Types.* Type systems with higher-ranked types generalize the traditional HM type system, and are useful in practice in languages like Haskell or other ML-like languages. Essentially higher-ranked types enable much of the expressive power of System F, with the advantage of implicit polymorphism. Complete type inference for System F is known to be undecidable [36]. Therefore, several partial type inference algorithms, exploiting additional type annotations, have been proposed in the past instead [15,25,27,31].

*Higher-Ranked Types and Bi-directional Type Checking.* Bi-directional type checking is also used to help with the inference of higher-ranked types [14,27]. Consider the following program:

(λf. (f 1, f 'c')) (λx. x)

which is not typeable under those type systems because they fail to infer the type of f, since it is supposed to be polymorphic. Using bi-directional type checking, we can rewrite this program as

((λf. (f 1, f 'c')) : (∀a. a → a) → (Int, Char)) (λx . x)

Here the type of f can be easily derived from the type signature using checked mode in bi-directional type checking. However, although some redundant annotations are removed by bi-directional type checking, the burden of inferring higherranked types is still carried by programmers: they are forced to add polymorphic annotations to help with the type derivation of higher-ranked types. For the above example, the type annotation is still *provided by programmers*, even though the necessary type information can be derived intuitively without any annotations: f is applied to λx. x, which is of type ∀a. a → a.

*Generalization.* Generalization is famous for its application in let polymorphism in the HM system, where generalization is adopted at let bindings. Let polymorphism is a useful component to introduce top-level quantifiers (rank 1 types) into a polymorphic type system. The previous example becomes typeable in the HM system if we rewrite it to: **let** f = λx. x **in** (f 1, f 'c').

*Type Inference for Higher-Ranked Types with the Application Mode.* Using our bi-directional type system with an application mode, the original expression can type check without annotations or rewrites: (λf. (f 1, f 'c')) (λx. x).

This result comes naturally if we allow type information flow from arguments to functions. For inferring polymorphic types for arguments, we use *generalization*. In the above example, we first infer the type ∀a. a → a for the argument, then pass the type to the function. A nice consequence of such an approach is that HM-style polymorphic **let** expressions are simply regarded as syntactic sugar to a combination of lambda/application:

**let** x = e<sup>1</sup> **in** e<sup>2</sup> -(λx. e2) e<sup>1</sup>

With this approach, nested lets can lead to types which are *more general* than HM. For example, **let** s = λx. x **in let** t = λy. s **in** e. The type of s is ∀a. a → a after generalization. Because t returns s as a result, we might expect t: ∀b. b → (∀a. a → a), which is what our system will return. However, HM will return type t: ∀b. ∀a. b → (a → a), as it can only return rank 1 types, which is less general than the previous one according to Odersky and L¨aufer's subtyping relation for polymorphic types [24].

*Conservativity over the Hindley-Milner Type System.* Our type system is a conservative extension over the Hindley-Milner type system, in the sense that every program that can type-check in HM is accepted in our type system, which is explained in detail in Sect. 3.2. This result is not surprising: after desugaring **let** into a lambda and an application, programs remain typeable.

*Comparing Predicative Higher-Ranked Type Inference Systems.* We will give a full discussion and comparison of related work in Sect. 6. Among those works, we believe the work by Dunfield and Krishnaswami [14], and the work by Peyton Jones et al. [27] are the most closely related work to our system. Both their systems and ours are based on a *predicative* type system: universal quantifiers can only be instantiated by monotypes. So we would like to emphasize our system's properties in relation to those works. In particular, here we discuss two interesting differences, and also briefly (and informally) discuss how the works compare in terms of expressiveness.

(1) Inference of higher-ranked types. In both works, every polymorphic type inferred by the system must correspond to one annotation provided by the programmer. However, in our system, some higher-ranked types can be inferred from the expression itself without any annotation. The motivating expression above provides an example of this.

(2) Where are annotations needed? Since type annotations are useful for inferring higher rank types, a clear answer to the question where annotations are needed is necessary so that programmers know when they are required to write annotations. To this question, previous systems give a concrete answer: only on the binding of polymorphic types. Our answer is slightly different: only on the bindings of polymorphic types in abstractions *that are not applied to arguments*. Roughly speaking this means that our system ends up with fewer or smaller annotations.

(3) Expressiveness. Based on these two answers, it may seem that our system should accept all expressions that are typeable in their system. However, this is not true because the application mode is *not* conservative over traditional bi-directional type checking. Consider the expression (λf:(∀a. a → a) → (Int, Char). f) (λg. (g 1, g 'a')), which is typeable in their system. In this case, even if g is a polymorphic binding without a type annotation the expression can still type-check. This is because the original application rule propagates the information from the outer binding into the inner expressions. Note that the fact that such expression type-checks does not contradict their guideline of providing type annotations for every polymorphic binder. Programmers that strictly follow their guideline can still add a polymorphic type annotation for g. However it does mean that it is a little harder to understand where annotations for polymorphic binders can be *omitted* in their system. This requires understanding how the applications in checked mode operate.

In our system the above expression is not typeable, as a consequence of the information flow in the application mode. However, following our guideline for annotations leads to a program that can be type-checked with a smaller annotation: (λf. f) (λg:(∀a. a → a). (g 1, g 'a')). This means that our work is not conservative over their work, which is due to the design choice of the application typing rule. Nevertheless, we can always rewrite programs using our guideline, which often leads to fewer/smaller annotations.

### **2.5 Application 2: More Expressive Type Applications**

The design choice of propagating arguments to functions was subject to consideration in the original work on local type inference [29], but was rejected due to possible non-determinism introduced by explicit type applications:

*"It is possible, of course, to come up with examples where it would be beneficial to synthesize the argument types first and then use the resulting information to avoid type annotations in the function part of an application expression....Unfortunately this refinement does not help infer the type of polymorphic functions. For example, we cannot uniquely determine the type of* x *in the expression* (fun[X](x) e) [Int] 3." [29]

Therefore, as a response to this challenge, our second application is a variant of System F. Our development of the calculus shows that the application mode can actually work well with calculi with explicit type applications. To explain the new design, consider the expression:

(Λa. λx : a. x + 1) Int

which is not typeable in the traditional type system for System F. In System F the lambda abstractions do not account for the context of possible function applications. Therefore when type checking the inner body of the lambda abstraction, the expression x+1 is ill-typed, because all that is known is that x has the (abstract) type a.

If we are allowed to propagate type information from arguments to functions, then we can verify that a = Int and x+1 is well-typed. The key insight in the new type system is to use application contexts to track type equalities induced by type applications. This enables us to type check expressions such as the body of the lambda above (x+1). Therefore, back to the problematic expression (fun[X](x) e) [Int] 3, the type of x can be inferred as either X or Int since they are actually equivalent.

*Sugar for Type Synonyms.* In the same way that we can regard **let** expressions as syntactic sugar, in the new type system we further *gain built-in type synonyms for free*. A *type synonym* is a new name for an existing type. Type synonyms are common in languages such as Haskell. In our calculus a simple form of type synonyms can be desugared as follows:

**type** a=A **in** e -(Λa. e) A

One practical benefit of such syntactic sugar is that it enables a direct encoding of a System F-like language with declarations (including type-synonyms). Although declarations are often viewed as a routine extension to a calculus, and are not formally studied, they are highly relevant in practice. Therefore, a more realistic formalization of a programming language should directly account for declarations. By providing a way to encode declarations, our new calculus enables a simple way to formalize declarations.

*Type Abstraction.* The type equalities introduced by type applications may seem like we are breaking System F type abstraction. However, we argue that *type abstraction* is still supported by our System F variant. For example:

**let** inc = Λa. λx : a. x + 1 **in** inc Int e

(after desugaring) does *not* type-check, as in a System-F like language. In our type system lambda abstractions that are immediatelly applied to an argument, and unapplied lambda abstractions behave differently. Unapplied lambda abstractions are just like System F abstractions and retain type abstraction. The example above illustrates this. In contrast the typeable example (Λa. λx : a. x + 1) Int, which uses a lambda abstraction directly applied to an argument, can be regarded as the desugared expression for **type** a = Int **in** λx:a.x+1.

### **3 A Polymorphic Language with Higher-Ranked Types**

This section first presents a declarative, *syntax-directed* type system for a lambda calculus with implicit higher-ranked polymorphism. The interesting aspects about the new type system are: (1) the typing rules, which employ a combination of inference and application modes; (2) the novel subtyping relation under an application context. Later, we prove our type system is type-safe by a type directed translation to System F [16,27] in Sect. 3.4. Finally an algorithmic type system is discussed in Sect. 3.5.

### **3.1 Syntax**

The syntax of the language is:


*Expressions.* Expressions e include variables (x), integers (n), annotated lambda abstractions (λx : A. e), lambda abstractions (λx. e), and applications (e<sup>1</sup> e2). Letters x, y, z are used to denote term variables. Notably, the syntax does not include a **let** expression (**let** x = e<sup>1</sup> **in** e2). Let expressions can be regarded as the standard syntax sugar (λx. e2) e1, as illustrated in more detail later.

*Types.* Types include type variables (a), functions (A → B), polymorphic types (∀a.A) and integers (Int). We use capital letters (A, B) for types, and small letters (a, b) for type variables. Monotypes are types without universal quantifiers.

*Contexts.* Typing contexts Γ are standard: they map a term variable x to its type A. We implicitly assume that all the variables in Γ are distinct. The main novelty lies in the *application contexts* Ψ, which are the main data structure needed to allow types to flow from arguments to functions. Application contexts are modeled as a stack. The stack collects the types of arguments in applications. The context is a stack because if a type is pushed last then it will be popped first. For example, inferring expression e under application context (a, Int), means e is now being applied to two arguments e1, e2, with e<sup>1</sup> : Int, e<sup>2</sup> : a, so e should be of type Int → a → A for some A.

### **3.2 Type System**

The top part of Fig. 1 gives the typing rules for our language. The judgment Γ - Ψ e ⇒ B is read as: under typing context Γ, and application context Ψ, e has type B. The standard inference mode Γ e ⇒ B can be regarded as a special case when the application context is empty. Note that the variable names are assumed to be fresh enough when new variables are added into the typing context, or when generating new type variables.

Rule T-Var says that if <sup>x</sup> : <sup>A</sup> is in the typing context, and <sup>A</sup> is a subtype of B under application context Ψ, then x has type B. It depends on the subtyping rules that are explained in Sect. 3.3. Rule T-Int shows that integer literals are only inferred to have type Int under an empty application context. This is obvious since an integer cannot accept any arguments.

T-Lam shows the strength of application contexts. It states that, without annotations, if the application context is non-empty, a type can be popped from the application context to serve as the type for x. Inference of the body then continues with the rest of the application context. This is possible, because the

**Fig. 1.** Syntax-directed typing and subtyping.

expression λx. e is being applied to an argument of type A, which is the type at the top of the application context stack. Rule T-Lam2 deals with the case when the application context is empty. In this situation, a monotype τ is *guessed* for the argument, just like the Hindley-Milner system.

Rule T-LamAnn1 works as expected with an empty application context: a new variable x is put with its type A into the typing context, and inference continues on the abstraction body. If the application context is non-empty, then the rule T-LamAnn2 applies. It checks that <sup>C</sup> is a subtype of <sup>A</sup> before putting x : A in the typing context. However, note that it is always possible to remove annotations in an abstraction if it has been applied to some arguments.

Rule T-App pushes types into the application context. The application rule first infers the type of the argument e<sup>2</sup> with type A. Then the type A is generalized in the same way that types in **let** expressions are generalized in the HM type system. The resulting generalized type is B. The generalization is shown in rule T-Gen, where all free type variables are extracted to quantifiers. Thus the type of e<sup>1</sup> is now inferred under an application context extended with type B. The generalization step is important to infer higher ranked types: since B is a possibly polymorphic type, which is the argument type of e1, then e<sup>1</sup> is of possibly a higher rank type.

*Let Expressions.* The language does not have built-in **let** expressions, but instead supports **let** as syntactic sugar. The typing rule for **let** expressions in the HM system is (without the gray-shaded part):

$$\frac{\Gamma \vdash e\_1 \Rightarrow A\_1 \qquad \Gamma\_{gen}(A\_1) = A\_2 \qquad \Gamma, x:A\_2 \parallel \Psi \vdash e\_2 \Rightarrow B}{\Gamma \vdash \Psi \vdash \text{let } x = e\_1 \text{ in } e\_2 \Rightarrow B} \text{ ${}\_{\Gamma \text{-LET}}$ }$$

where we do generalization on the type of e1, which is then assigned as the type of x while inferring e2. Adapting this rule to our system with application contexts would result in the gray-shaded part, where the application context is only used for e2, because e<sup>2</sup> is the expression being applied. If we desugar the **let** expression (**let** x = e<sup>1</sup> **in** e2) to ((λx. e2) e1), we have the following derivation:

$$\begin{array}{llll} \Gamma \vdash e\_1 \Rightarrow A\_1 & \Gamma\_{gen}(A\_1) = A\_2 & \overline{\Gamma \vdash \Psi, A\_2 \vdash \lambda x. \; e\_2 \Rightarrow \; A\_2 \longrightarrow B} & \text{T-LAM} \\ \hline & & \Gamma \vdash \Psi \vdash (\lambda x. \; e\_2) \; e\_1 \Rightarrow \; B & & & \\ \end{array} \text{T-LAM}$$

The type <sup>A</sup><sup>2</sup> is now pushed into application context in rule T-App, and then assigned to <sup>x</sup> in T-Lam. Comparing this with the typing derivations with rule T-Let, we now have same preconditions. Thus we can see that the rules in Fig. <sup>1</sup> are sufficient to express an HM-style polymorphic let construct.

*Meta-Theory.* The type system enjoys several interesting properties, especially lemmas about application contexts. Before we present those lemmas, we need a helper definition of what it means to use arrows on application contexts.

**Definition 1 (**Ψ → B**).** If Ψ = A1, A2, ..., An, then Ψ → B means the function type A<sup>n</sup> → ... → A<sup>2</sup> → A<sup>1</sup> → B.

Such definition is useful to reason about the typing result with application contexts. One specific property is that the application context determines the form of the typing result.

**Lemma 1 (**Ψ **Coincides with Typing Results).** *If* Γ - Ψ e ⇒ A*, then for some* A *, we have* A = Ψ → A *.*

Having this lemma, we can always use the judgment Γ - Ψ e ⇒ Ψ → A instead of Γ - Ψ e ⇒ A.

In traditional bi-directional type checking, we often have one subsumption rule that transfers between inference and checked mode, which states that if an expression can be inferred to some type, then it can be checked with this type. In our system, we regard the normal inference mode Γ e ⇒ A as a special case, when the application context is empty. We can also turn from normal inference mode into application mode with an application context.

#### **Lemma 2 (Subsumption).** *If* Γ <sup>e</sup> <sup>⇒</sup> <sup>Ψ</sup> <sup>→</sup> <sup>A</sup>*, then* <sup>Γ</sup> - Ψ e ⇒ Ψ → A*.*

The relationship between our system and standard Hindley Milner type system can be established through the desugaring of let expressions. Namely, if e is typeable in Hindley Milner system, then the desugared expression |e| is typeable in our system, with a more general typing result.

**Lemma 3 (Conservative over HM).** *If* Γ -HM <sup>e</sup> <sup>⇒</sup> <sup>A</sup>*, then for some* <sup>B</sup>*, we have* Γ -|e| ⇒ B*, and* B <: A*.*

### **3.3 Subtyping**

We present our subtyping rules at the bottom of Fig. 1. Interestingly, our subtyping has two different forms.

*Subtyping.* The first judgment follows Odersky and L¨aufer [24]. A <: B means that A is more polymorphic than B and, equivalently, A is a subtype of B. Rules S-Int and S-Var are trivial. Rule S-ForallR states <sup>A</sup> is subtype of <sup>∀</sup>a.B only if <sup>A</sup> is a subtype of <sup>B</sup>, with the assumption <sup>a</sup> is a fresh variable. Rule S-ForallL says ∀a.A is a subtype of B if we can instantiate it with some τ and show the result is a subtype of <sup>B</sup>. In rule S-Fun, we see that subtyping is contra-variant on the argument type, and covariant on the return type.

*Application Subtyping.* The typing rule T-Var uses the second subtyping judgment Ψ - A <: B. To motivate this new kind of judgment, consider the expression id 1 for example, whose derivation is stuck at T-Var (here we assume id : ∀a.a → a ∈ Γ):

$$\frac{\begin{array}{c} \Gamma \vdash 1 \Rightarrow \mathsf{ lnt} \end{array} \begin{array}{c} \mathsf{ lnt} \end{array} \begin{array}{c} \Gamma\_{gen}(\mathsf{lnt}) = \mathsf{lnt} \end{array} \begin{array}{c} \frac{\mathsf{id} \mathrel{\ } \forall a.a \rightarrow a \in \Gamma \qquad \text{??} \qquad \text{??} \end{array} \begin{array}{c} \mathsf{??} \begin{array}{c} \mathsf{T}\text{-VaR} \\ \Rightarrow \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end{array} \begin{array}{c} \mathsf{T}\text{-VAR} \end$$

Here we know that id : ∀a.a → a and also, from the application context, that id is applied to an argument of type Int. Thus we need a mechanism for solving the instantiation a = Int and return a supertype Int → Int as the type of id. This is precisely what the application subtyping achieves: resolve instantiation constraints according to the application context. Notice that unlike existing works [14,27], application subtyping provides a way to solve instantiation more *locally*, since it does not mutually depend on typing.

Back to the rules in Fig. 1, one way to understand the judgment Ψ - A <: B from a computational point-of-view is that the type B is a *computed* output, rather than an input. In other words B is determined from Ψ and A. This is unlike the judgment A <: B, where both A and B would be computationally interpreted as inputs. Therefore it is not possible to view A <: B as a special case of Ψ -A <: B where Ψ is empty.

There are three rules dealing with application contexts. Rule S-Empty is for case when the application context is empty. Because it is empty, we have no constraints on the type, so we return it back unchanged. Note that this is where HM systems (also Peyton Jones et al. [27]) would normally use a rule Inst to remove top-level quantifiers:

$$\begin{array}{c}\overline{\forall \overline{a}. A \; :\; A \; \|\overline{a} \mapsto \overline{\tau}\|}\; \; ^{\text{Insrt}} \end{array}$$

Our system does not need Inst, because in applications, type information flows from arguments to the function, instead of function to arguments. In the latter case, Inst is needed because a function type is wanted instead of a polymorphic type. In our approach, instantiation of type variables is avoided unless necessary.

The two remaining rules apply when the application context is non-empty, for polymorphic and function types respectively. Note that we only need to deal with these two cases because Int or type variables a cannot have a nonempty application context. In rule S-Forall2, we instantiate the polymorphic type with some τ , and continue. This instantiation is forced by the application context. In rule S-Fun2, one function of type <sup>A</sup> <sup>→</sup> <sup>B</sup> is now being applied to an argument of type C. So we check C <: A. Then we continue with B and the rest application context, and return C → D as the result type of the function.

*Meta-Theory.* Application subtyping is novel in our system, and it enjoys some interesting properties. For example, similarly to typing, the application context decides the form of the supertype.

**Lemma 4 (**Ψ **Coincides with Subtyping Results).** *If* Ψ - A <: B*, then for some* B *,* B = Ψ → B *.*

Therefore we can always use the judgment Ψ - A <: Ψ → B , instead of Ψ - A <: B. Application subtyping is also reflexive and transitive. Interestingly, in those lemmas, if we remove all applications contexts, they are exactly the reflexivity and transitivity of traditional subtyping.

**Lemma 5 (Reflexivity).** Ψ -Ψ → A <: Ψ → A*.*

**Lemma 6 (Transitivity).** *If* Ψ<sup>1</sup> - A <: Ψ<sup>1</sup> → B*, and* Ψ<sup>2</sup> - B <: Ψ<sup>2</sup> → C*, then* Ψ2, Ψ<sup>1</sup> -A <: Ψ<sup>1</sup> → Ψ<sup>2</sup> → C*.*

Finally, we can convert between subtyping and application subtyping. We can remove the application context and still get a subtyping relation:

**Lemma 7 (**Ψ - <: **to** <:**).** *If* Ψ -A <: B*, then* A <: B*.*

Transferring from subtyping to application subtyping will result in a more general type.

**Lemma 8 (**<: **to** Ψ - <:**).** *If* A <: Ψ → B1*, then for some* B2*, we have* Ψ -A <: Ψ → B2*, and* B<sup>2</sup> <: B1*.*

This lemma may not seem intuitive at first glance. Consider a concrete example Int → ∀a.a <: Int → Int, and Int - Int → ∀a.a <: Int → ∀a.a. The former one, holds because we have ∀a.a <: Int in the return type. But in the latter one, after Int is consumed from application context, we eventually reach S-Empty, which always returns the original type back.

### **3.4 Translation to System F, Coherence and Type-Safety**

We translate the source language into a variant of System F that is also used in Peyton Jones et al. [27]. The translation is shown to be coherent and type safe. Due to space limitations, we only summarize the key aspects of the translation. Full details can be found in the supplementary materials of the paper.

The syntax of our target language is as follows:

$$\text{Expressions } s, f ::= x \mid n \mid \lambda x : A. \ s \mid A\\a. s \mid s\_1 \; s\_2 \mid s\_1 \; A$$

In the translation, we use f to refer to the coercion function produced by the subtyping translation, and s to refer to the translated term in System F. We write Γ -<sup>F</sup> s : A to mean the term s has type A in System F.

The type-directed translation follows the rules in Fig. 1, with a translation output in the forms of judgments. We summarize all judgments as:


For example, A <: B f means that if A <: B holds in the source language, we can translate it into a System F term f, which is a coercion function and has type A → B. We prove that our system is type safe by proving that the translation produces well-typed terms.

#### **Lemma 9 (Typing Soundness).** *If* Γ - Ψ e ⇒ A s*, then* Γ -<sup>F</sup> s : A*.*

However, there could be multiple targets corresponding to one expression due to the multiple choices for τ . To prove that the translation is coherent, we prove that all the translations for one expression have the same operational semantics. We write |e| for the expressions after type erasure since types are useless after type checking. Because multiple targets could have different number of coercion functions, we use η-id equality [5] instead of syntactic equality, where two expressions are regarded as equivalent if they can turn into the same expression through η-reduction or removal of redundant identity functions. We then prove that our translation actually generates a *unique* target:

**Lemma 10 (Coherence).** *If* Γ<sup>1</sup> - Ψ<sup>1</sup> e ⇒ A s1*, and* Γ<sup>2</sup> - Ψ<sup>2</sup> - e ⇒ B s2*, then* |s1| ηid |s2|*.*

### **3.5 Algorithmic System**

Even though our specification is syntax-directed, it does not directly lead to an algorithm, because there are still many guesses in the system, such as in rule T-Lam2. This subsection presents a brief introduction of the algorithm, which essentially follows the approach by Peyton Jones et al. [27]. Full details can be found in the supplementary materials. Instead of guessing, the algorithm creates meta type variables α, β-

 which are waiting to be solved. The judgment for the algorithmic type system is (S0, N0) - Γ - Ψ e ⇒ A → (S1, N1). Here we use N as name supply, from which we can always extract new names. We use S as a notation for the substitution that maps meta type variables to their solutions. For example, rule T-Lam2 becomes Γ, x : β-

$$\text{Arrabes to their solutions. For example, true 1-LAM2}$$

$$\frac{(S\_0, N\_0) \vdash I, x: \widehat{\beta} \vdash e \implies A \longleftarrow (S\_1, N\_1)}{(S\_0, N\_0 \widehat{\beta}) \vdash I \vdash \lambda x. \ e \implies \widehat{\beta} \to A \hookrightarrow (S\_1, N\_1)} \text{ AT-LAM1}$$

Comparing it to rule T-Lam2, <sup>τ</sup> is replaced by a new meta type variable <sup>β</sup>- from name supply <sup>N</sup>0β-. But despite of the name supply and substitution, the rule retains the structure of T-Lam2.

Having the name supply and substitutions, the algorithmic system is a direct extension of the specification in Fig. 1, with a process to do unifications that solve meta type variables. Such unification process is quite standard and similar to the one used in the Hindley-Milner system. We proved our algorithm is sound and complete with respect to the specification.

**Theorem 1 (Soundness).** *If* ([], N0) - Γ e ⇒ A → (S1, N1)*, then for any substitution* V *with* dom(V ) = *fmv* (S1Γ, S1A)*, we have* V S1Γ e ⇒ V S1A*.*

**Theorem 2 (Completeness).** *If* Γ e ⇒ A*, then for a fresh* N0*, we have* ([], N0) - Γ e ⇒ B → (S1, N1)*, and for some* S2*, we have* Γ(S2S1B) <: Γ(A)*.*

### **4 More Expressive Type Applications**

This section presents a System-F-like calculus, which shows that the application mode not only does work well for calculi with explicit type applications, but it also adds interesting expressive power, while at the same time retaining uniqueness of types for *explicitly* polymorphic functions. One additional novelty in this section is to present another possible variant of typing and subtyping rules for the application mode, by exploiting the lemmas presented in Sects. 3.2 and 3.3.

$$\begin{aligned} \langle \emptyset \rangle A &= A & \langle \varGamma, x:B \rangle A &= \langle \varGamma \rangle A \\ \langle \varGamma, a \rangle A &= \langle \varGamma \rangle A & \langle \varGamma, a = B \rangle A &= \langle \varGamma \rangle (A [a \mapsto B]) \end{aligned}$$

**Fig. 2.** Apply contexts as substitutions on types.

$$\frac{a \in \varGamma}{\varGamma \vdash a} \text{ W\text{-}T\text{-}\text{TVaR}} \quad \overline{\varGamma \vdash \textsf{Int}} \text{ W\text{-}\text{-Int}} \quad \frac{\varGamma \vdash A \quad \Gamma \vdash B}{\varGamma \vdash A \to B} \text{ W\text{-}\text{-ARow}} \quad \frac{\Gamma, a \vdash A}{\varGamma \vdash \forall a.A} \text{ W\text{-}A\text{-L}}$$

#### **Fig. 3.** Well-formedness.

#### **4.1 Syntax**

We focus on a new variant of the standard System F. The syntax is as follows:

$$\begin{array}{llll} \text{Expr} & e ::= x \mid n \mid \lambda x : A. \ e \mid \lambda x. \ e \mid e\_1 \; e\_2 \mid A a. e \mid e \; [A] \\ \text{Type} & A ::= a \mid \text{Int} \mid A \to B \mid \forall a. A \\ \text{Typing Conxxt} & \Gamma ::= \mathcal{Q} \mid \Gamma, x : A \mid \Gamma, a \mid \Gamma, a = A \\ \text{Application context} \; \Psi & ::= \mathcal{Q} \mid \Psi, A \mid \Psi, [A] \end{array}$$

The syntax is mostly standard. Expressions include variables x, integers n, annotated abstractions λx : A. s, unannotated abstractions λx. e, applications e<sup>1</sup> e2, type abstractions Λa.s, and type applications e<sup>1</sup> [A]. Types includes type variable a, integers Int, function types A → B, and polymorphic types ∀a.A.

The main novelties are in the typing and application contexts. Typing contexts contain the usual term variable typing x : A, type variables a, and type equations a = A, which track equalities and are not available in System F. Application contexts use A for the *argument type* for term-level applications, and use [A] for the *type argument itself* for type applications.

*Applying Contexts.* The typing contexts contain type equations, which can be used as substitutions. For example, a = Int, x : Int, b = *Bool* can be applied to a → b to get the function type Int → *Bool*. We write ΓA for Γ applied as a substitution to type A. The formal definition is given in Fig. 2.

*Well-Formedness.* The type well-formedness under typing contexts is given in Fig. 3, which is quite straightforward. Notice that there is no rule corresponding to type variables in type equations. For example, a is not a well-formed type under typing context a = Int, instead, a = Inta is. In other words, we keep the invariant: *types are always fully substituted under the typing context*.

The well-formedness of typing contexts Γ *ctx* , and the well-formedness of application contexts Γ - Ψ can be defined naturally based on the well-formedness of types. The specific definitions can be found in the supplementary materials.

**Fig. 4.** Type system for the new System F variant.

#### **4.2 Type System**

*Typing Judgments.* From Lemmas 1 and 4, we know that the application context always coincides with typing/subtyping results. This means that the types of the arguments can be recovered from the application context. So instead of the whole type, we can use only the return type as the output type. For example, we review the rule T-Lam in Fig. 1:

$$\frac{\Gamma, x:A \upharpoonright e \vdash e \implies B}{\Gamma \uupupright \Psi, A \vdash \lambda x. \ e \implies A \to B} \operatorname{T\text{-}L\text{-}A} \qquad \frac{\Gamma, x:A \upharpoonright \Psi \vdash e \implies C}{\Gamma \uupupright \Psi, A \vdash \lambda x. \ e \implies C} \operatorname{T\text{-}L\text{-}A\text{-}x}$$

We have B = Ψ → C for some C by Lemma 1. Instead of B, we can directly return C as the output type, since we can derive from the application context that e is of type Ψ → C, and λx. e is of type (Ψ,A) → C. Thus we obtain the T-Lam-Alt rule.

Note that the choice of the style of the rules is only a matter of taste in the language in Sect. 3. However, it turns out to be very useful for our variant of System F, since it helps avoiding introducing types like ∀a = Int.a. Therefore, we adopt the new form of judgment. Now the judgment Γ - Ψ e ⇒ A is interpreted as: *under the typing context* Γ, *and the application context* Ψ, *the return type of* e *applied to the arguments whose types are in* Ψ *is* A.

*Typing Rules.* Using the new interpretation of the typing judgment, we give the typing rules in the top of Fig. 4. SF-Var depends on the subtyping rules. Rule SF-Int always infers integer types. Rule SF-LamAnn1 first applies current context on A, then puts x : ΓA into the typing context to infer e. The return type is a function type because the application context is empty. Rule SF-LamAnn2 has a non-empty application context, so it requests that the type at the top of the application context is equivalent to ΓA. The output type is B instead of a function type. Notice how the invariant that types are fully substituted under the typing context is preserved in these two rules.

Rule SF-Lam pops the type <sup>A</sup> from the application context, puts <sup>x</sup> : <sup>A</sup> into the typing context, and returns only the return type <sup>B</sup>. In rule SF-App, the argument type A is pushed into the application context for inferring e1, so the output type B is the type of e<sup>1</sup> under application context (Ψ,A), which is exactly the return type of e<sup>1</sup> e<sup>2</sup> under Ψ.

Rule SF-TLam1 is for type abstractions. The type variable <sup>a</sup> is pushed into the typing context, and the return type is a polymorphic type. In rule SF-TLam2, the application context has the type argument <sup>A</sup> at its top, which means the type abstraction is applied to A. We then put the type equation a = A into the typing context to infer e. Like term-level applications, here we only return the type <sup>B</sup> instead of a polymorphic type. In rule SF-TApp, we first apply the typing context on the type argument A, then we put the applied type argument ΓA into the application context to infer e, and return B as the output type.

*Subtyping.* The definition of subtyping is given at the bottom of Fig. 4. As with the typing rules, the part of argument types corresponding to the application context is omitted in the output. We interpret the rule form Ψ - A <: B as, under the application context Ψ, A is a subtype of the type whose type arguments are Ψ and the return type is B.

Rule SF-SEmpty returns the input type under the empty application context. Rule SF-STApp instantiates <sup>a</sup> with the type argument <sup>A</sup>, and returns <sup>C</sup>. Note how application subtyping can be extended naturally to deal with type applications. Rule SF-SApp requests that the argument type is the same as the top type in the application context, and returns C.

### **4.3 Meta Theory**

Applying the idea of the application mode to System F results in a well-behaved type system. For example, subtyping transitivity becomes more concise:

**Lemma 11 (Subtyping transitivity).** *If* Ψ<sup>1</sup> - A <: B*, and* Ψ<sup>2</sup> - B <: C*, then* Ψ2, Ψ<sup>1</sup> -A <: C*.*

Also, we still have the interesting subsumption lemma that transfers from the inference mode to the application mode:

**Lemma 12 (Subsumption).** *If* Γ e ⇒ A*, and* Γ - Ψ*, and* Ψ - A <: B*, then* Γ - Ψ e ⇒ B*.*

Furthermore, we prove the type safety by proving the progress lemma and the preservation lemma. The detailed definitions of operational semantics and values can be found in the supplementary materials.

**Lemma 13 (Progress).** *If* <sup>∅</sup> e ⇒ T*, then either* e *is a value, or there exists* e *, such that* e −→ e *.*

**Lemma 14 (Preservation).** *If* Γ - Ψ e ⇒ A*, and* e −→ e *, then* Γ - Ψ - e ⇒ A*.*

Moreover, introducing type equality preserves unique types:

**Lemma 15 (Uniqueness of typing).** *If* Γ - Ψ <sup>e</sup> <sup>⇒</sup> <sup>A</sup>*, and* <sup>Γ</sup> - Ψ - e ⇒ B*, then* A = B*.*

### **5 Discussion**

This section discusses possible design choices regarding bi-directional type checking with the application mode, and talks about possible future work.

### **5.1 Combining Application and Checked Modes**

Although the application mode provides us with alternative design choices in a bi-directional type system, a checked mode can still be *easily* added. One motivation for the checked mode would be annotated expressions e : A, where the type of expressions is known and is therefore used to check expressions.

Consider adding e : A for introducing the third checked mode for the language in Sect. 3. Notice that, since the checked mode is stronger than application mode, when entering checked mode the application context is no longer useful. Instead we use application subtyping to satisfy the application context requirements. A possible typing rule for annotation expressions is:

Ψ - A <: B Γ e ⇐ A Γ- Ψ - (e : A) ⇒ B T-Ann

Here, e is checked using its annotation A, and then we instantiate A to B using subtyping with application context Ψ.

Now we can have a rule set of the checked mode for all expressions. For example, one useful rule for abstractions in checked mode could be Abs-Chk, where the parameter type A serves as the type of x, and typing checks the body with B. Also, combined with the information flow, the checked rule for application checks the function with the full type.

$$\frac{\Gamma, x:A \vdash e \quad \leftarrow B}{\Gamma \vdash \lambda x. \; e \quad \leftarrow A \to B}\_{\text{ABS}.\; \text{Cuk}} \quad \frac{\Gamma \vdash e\_2 \Rightarrow A \quad \quad \Gamma \vdash e\_1 \quad \leftarrow A \to B}{\Gamma \vdash e\_1 \; e\_2 \quad \leftarrow B}\_{\text{AP}.\; \text{Cuk}}$$

Note that adding expression annotations might bring convenience for programmers, since annotations can be more freely placed in a program. For example, becomes valid. However this does not add expressive power, since programs that are typeable under expression annotations, would remain typeable after moving the annotations to bindings. For example the previous program is equivalent to

This discussion is a sketch. We have not defined the corresponding declarative system nor algorithm. However we believe that the addition of a checked mode will *not* bring surprises to the meta-theory.

### **5.2 Additional Constructs**

In this section, we show that the application mode is compatible with other constructs, by discussing how to add support for pairs in the language given in Sect. 3. A similar methodology would apply to other constructs like sum types, data types, if-then-else expressions and so on.

The introduction rule for pairs must be in the inference mode with an empty application context. Also, the subtyping rule for pairs is as expected.

$$\frac{\Gamma \vdash e\_1 \Rightarrow A \quad \quad \Gamma \vdash e\_2 \Rightarrow B}{\Gamma \vdash (e\_1, e\_2) \Rightarrow (A, B)} \text{-} \begin{array}{c} A\_1 \ \text{-} \text{-} A\_1 \quad \quad A\_2 \ \text{-} \text{-} \text{-} B\_2 \\ \hline (A\_1, A\_2) \ \text{-} \colon \ (B\_1, B\_2) \end{array} \text{s-Para}$$

The application mode can apply to the elimination constructs of pairs. If one component of the pair is a function, for example, (**fst** (λx. x, 3) 4), then it is possible to have a judgment with a non-empty application context. Therefore, we can use the application subtyping to account for the application contexts:

$$\frac{\begin{array}{c} \Gamma \vdash e \Rightarrow (A,B) \qquad \Psi \vdash A \; \hspace{1cm} \; C \; \vdash \; C \; \hspace{1cm} \; C \; \vdash \; C \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{1cm} \; \hspace{$$

However, in polymorphic type systems, we need to take the subsumption rule into consideration. For example, in the expression (λx : (∀a.(a, b)). **fst** x), **fst** is applied to a polymorphic type. Interestingly, instead of a non-deterministic subsumption rule, having polymorphic types actually leads to a simpler solution. According to the philosophy of the application mode, the types of the arguments always flow into the functions. Therefore, instead of regarding (**fst** e) as an expression form, where e is itself an argument, we could regard **fst** as a function on its own, whose type is (∀ab.(a, b) → a). Then as in the variable case, we use the subtyping rule to deal with application contexts. Thus the typing rules for **fst** and **snd** can be modeled as:

$$\frac{\Psi \vdash (\forall ab. (a, b) \to a) \quad \text{<} \colon A \newline \begin{array}{c} A \\ \text{T-Fsr2} \end{array} \begin{array}{c} \Psi \vdash (\forall ab. (a, b) \to b) \quad \text{<} \colon A \\ \hline \Gamma \lor \Psi \vdash \mathbf{snd} \Rightarrow A \end{array} \text{T-Sns2}$$

Note that another way to model those two rules would be to simply have an initial typing environment Γinitial ≡ **fst** : (∀ab.(a, b) → a), **snd** : (∀ab.(a, b) → b). In this case the elimination of pairs be dealt directly by the rule for variables.

An extended version of the calculus presented in Sect. 3, which includes the rules for pairs (T-Pair, S-Pair, T-Fst2 and T-Snd2), has been formally studied. All the theorems presented in Sect. 3 hold with the extension of pairs.

### **5.3 Dependent Type Systems**

One remark about the application mode is that the same idea is possibly applicable to systems with advanced features, where type inference is sophisticated or even undecidable. One promising application is, for instance, dependent type systems [2,3,10,21,37]. Type systems with dependent types usually unify the syntax for terms and types, with a single lambda abstraction generalizing both type and lambda abstractions. Unfortunately, this means that the **let** desugar is not valid in those systems. As a concrete example, consider desugaring the expression **let** a = Int**in** λx : a. x + 1 into Int, which is illtyped because the type of x in the abstraction body is a and not Int.

Because **let** cannot be encoded, declarations cannot be encoded either. Modeling declarations in dependently typed languages is a subtle matter, and normally requires some additional complexity [34].

We believe that the same technique presented in Sect. 4 can be adapted into a dependently typed language to enable a **let** encoding. In a dependent type system with unified syntax for terms and types, we can combine the two forms in the typing context (x : A and a = A) into a unified form x = e : A. Then we can combine two application rules SF-App and SF-TApp into De-App, and also two abstraction rules SF-Lam and SF-TLam1 into De-Lam.

$$\frac{\begin{array}{c} \Gamma \vdash e\_2 \Rightarrow \ A \end{array} \quad \begin{array}{c} \Gamma \Rightarrow \Psi, e\_2 : A \vdash e\_1 \Rightarrow \ B \end{array}}{\begin{array}{c} \Gamma \Rightarrow e\_1 \ e\_2 \Rightarrow \ B \end{array}} \begin{array}{c} \Gamma, x = e\_1 : A \lor \Psi \vdash e \Rightarrow \ B \\ \hline \Gamma \lor \Psi, e\_1 : A \vdash \lambda x. \; e \Rightarrow \ B \end{array} \begin{array}{c} \text{D} \therefore \text{LAM} \Rightarrow \text{LAM} \end{array}$$

With such rules it would be possible to handle declarations easily in dependent type systems. Note this is still a rough idea and we have not fully worked out the typing rules for this type system yet.

### **6 Related Work**

#### **6.1 Bi-directional Type Checking**

Bi-directional type checking was popularized by the work of Pierce and Turner [29]. It has since been applied to many type systems with advanced features. The alternative application mode introduced by us enables a variant of bi-directional type checking. There are many other efforts to refine bi-directional type checking.

Colored local type inference [25] refines local type inference for *explicit* polymorphism by propagating partial type information. Their work is built on distinguishing inherited types (known from the context) and synthesized types (inferred from terms). A similar distinction is achieved in our algorithm by manipulating type variables [14]. Also, their information flow is from functions to arguments, which is fundamentally different from the application mode.

The system of *tridirectional* type checking [15] is based on bi-directional type checking and has a rich set of property types including intersections, unions and quantified dependent types, but without parametric polymorphism. Tridirectional type checking has a new direction for supporting type checking unions and existential quantification. Their third mode is basically unrelated to our application mode, which propagates information from outer applications.

Greedy bi-directional polymorphism [13] adopts a greedy idea from Cardelli [4] on bi-directional type checking with higher ranked types, where the type variables in instantiations are determined by the first constraint. In this way, they support some uses of impredicative polymorphism. However, the greediness also makes many obvious programs rejected.

#### **6.2 Type Inference for Higher-Ranked Types**

As a reference, Fig. 5 [14,20] gives a high-level comparison between related works and our system.

*Predicative Systems.* Peyton Jones et al. [27] developed an approach for type inference for higher rank types using traditional bi-directional type checking based on Odersky and L¨aufer [24]. However in their system, in order to do instantiation on higher rank types, they are forced to have an additional type category (ρ types) as a special kind of higher rank type without top-level quantifiers. This complicates their system since they need to have additional rule sets for such types. They also combine a variant of the containment relation from Mitchell [23] for deep skolemisation in subsumption rules, which we believe is compatible with our subtyping definition.

Dunfield and Krishnaswami [14] build a simple and concise algorithm for higher ranked polymorphism based on traditional bidirectional type checking. They deal with the same language of Peyton Jones et al. [27], except they do not have let expressions nor generalization (though it is discussed in design variations). They have a special *application judgment* which delays instantiation until the expression is applied to some argument. As with application mode, this avoids the additional category of types. Unlike their work, our work supports generalization and HM-style let expressions. Moreover the use of an application mode in our work introduces several differences as to when and where annotations are needed (see Sect. 2.4 for related discussion).

*Impredicative Systems. ML<sup>F</sup>* [18,19,32] generalizes ML with first-class polymorphism. *ML<sup>F</sup>* introduces a new type of bounded quantification (either rigid or flexible) for polymorphic types so that instantiation of polymorphic bindings is delayed until a principal type is found. The HML system [20] is proposed as a simplification and restriction of *ML<sup>F</sup>* . HML only uses flexible types, which simplifies the type inference algorithm, but retains many interesting properties and features.

The FPH system [35] introduces boxy monotypes into System F types. One critique of boxy type inference is that the impredicativity is deeply hidden in the algorithmic type inference rules, which makes it hard to understand the interaction between its predicative constraints and impredicative instantiations [31].


**Fig. 5.** Comparison of higher-ranked type inference systems.

### **6.3 Tracking Type Equalities**

Tracking type equalities is useful in various situations. Here we discuss specifically two related cases where tracking equalities plays an important role.

*Type Equalities in Type Checking.* Tracking type equalities is one essential part for type checking algorithms involving Generalized Algebraic Data Types (GADTs) [6,26,33]. For example, Peyton Jones et al. [26] propose a type inference algorithm based on unification for GADTs, where type equalities only apply to user-specified types. However, reasoning about type equalities in GADTs is essentially different from the approach in Sect. 4: type equalities are introduced by pattern matches in GADTs, while they are introduced through type applications in our system. Also, type equalities in GADTs are local, in the sense different branches in pattern matches have different type equalities for the same type variable. In our system, a type equality is introduced globally and is never changed. However, we believe that they can be made compatible by distinguishing different kinds of equalities.

*Equalities in Declarations.* In systems supporting dependent types, type equalities can be introduced by declarations. In the variant of pure type systems proposed by Severi and Poll [34], expressions x = a : A **in** b generate an equality x = a : A in the typing context, which can be fetched later through δ-reduction. However, δ-reduction rules require careful design, and the conversion rule of δ-reduction makes the type system non-deterministic. One potential usage of the application mode is to help reduce the complexity for introducing declarations in those type systems, as briefly discussed in Sect. 5.3.

### **7 Conclusion**

We proposed a variant of bi-directional type checking with a new application *mode*, where type information flows from arguments to functions in applications. The application mode is essentially a generalization of the inference mode, can therefore work naturally with inference mode, and avoid the rule duplication that is often needed in traditional bi-directional type checking. The application mode can also be combined with the checked mode, but this often does not add expressiveness. Compared to traditional bi-directional type checking, the application mode opens a new path to the design of type inference/checking.

We have adopted the application mode in two type systems. Those two systems enjoy many interesting properties and features. However as bidirectional type checking can be applied to many type systems, we believe application mode is applicable to various type systems. One obvious potential future work is to investigate more systems where the application mode brings benefits. This includes systems with subtyping, intersection types [8,30], static overloading, or dependent types.

**Acknowledgements.** We thank the anonymous reviewers for their helpful comments. This work has been sponsored by the Hong Kong Research Grant Council projects number 17210617 and 17258816.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Behavioural Equivalence via Modalities for Algebraic Effects**

Alex Simpson and Niels Voorneveld(B)

Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia {Alex.Simpson,Niels.Voorneveld}@fmf.uni-lj.si

**Abstract.** The paper investigates behavioural equivalence between programs in a call-by-value functional language extended with a signature of (algebraic) effect-triggering operations. Two programs are considered as being behaviourally equivalent if they enjoy the same behavioural properties. To formulate this, we define a logic whose formulas specify behavioural properties. A crucial ingredient is a collection of *modalities* expressing effect-specific aspects of behaviour. We give a general theory of such modalities. If two conditions, *openness* and *decomposability*, are satisfied by the modalities then the logically specified behavioural equivalence coincides with a modality-defined notion of applicative bisimilarity, which can be proven to be a congruence by a generalisation of Howe's method. We show that the openness and decomposability conditions hold for several examples of algebraic effects: nondeterminism, probabilistic choice, global store and input/output.

### **1 Introduction**

The notion of *behavioural equivalence* between programs is a fundamental concept in the theory of programming languages. A conceptually natural approach to defining behavioural equivalence is to consider two programs as being equivalent if they enjoy the same 'behavioural properties'. This can be made precise by specifying a *behavioural logic* whose formulas express behavioural properties. Two programs M,N are then defined to be equivalent if, for all formulas Φ, it holds that M |= Φ iff N |= Φ (where M |= Φ expresses the satisfaction relation: program M enjoys property Φ).

This logical approach to defining behavioural equivalence has been particularly prominent in concurrency theory, where the classic result is that the equivalence defined by Hennessy-Milner logic [4] coincides with bisimilarity [14,17]. The aim of the present paper is to adapt the logical approach to the very different computational paradigm of *applicative programming with effects*.

c The Author(s) 2018

A. Simpson—Supported by the Slovenian Research Agency, research core funding No. P1–0294.

N. Voorneveld—Supported by the Air Force Office of Scientific Research under award number FA9550-17-1-0326, and by EU-MSCA-RISE project 731143 (CID).

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 300–326, 2018. https://doi.org/10.1007/978-3-319-89884-1\_11

More precisely, we consider a call-by-value functional programming language with *algebraic effects* in the sense of Plotkin and Power [21]. Broadly speaking, effects are those aspects of computation that involve a program interacting with its 'environment'; for example: nondeterminism, probabilistic choice (in both cases, the choice is deferred to the environment); input/output; mutable store (the machine state is modified); control operations such as exceptions, jumps and handlers (which interact with the continuation in the evaluation process); etc. Such general effects collectively enjoy common properties identified in the work of Moggi on monads [15]. Among them, algebraic effects play a special role. They can be included in a programming language by adding effect-triggering operations, whose 'algebraic' nature means that effects act independently of the continuation. From the aforementioned examples of effects, only jumps and handlers are non-algebraic. Thus the notion of algebraic effect covers a broad range of effectful computational behaviour. Call-by-value functional languages provide a natural context for exploring effectful programming. From a theoretical viewpoint, other programming paradigms are subsumed; for example, imperative programs can be recast as effectful functional ones. From a practical viewpoint, the combination of effects with call-by-value leads to the natural programming style supported by impure functional languages such as OCaml.

In order to focus on the main contributions of the paper (the behavioural logic and its induced behavioural equivalence), we instantiate "call-by-value functional language with algebraic effects" using a very simple language. Our language is a simply-typed λ-calculus with a base type of natural numbers, general recursion, call-by-value function evaluation, and algebraic effects, similar to [21]; although, for technical convenience, we adopt the (equivalent) formulation of fine-grained call-by-value [13]. The language is defined precisely in Sect. 2. Following [8,21], an operational semantics is given that evaluates programs to *effect trees*.

Section 3 introduces the behavioural logic. In our impure functional setting, the evaluation of a program of type τ results in a computational process that may or may not invoke effects, and which may or may not terminate with a return *value* of type τ . The key ingredient in our logic is an effect-specific family O of *modalities*, where each modality o ∈ O converts a property φ of values of type τ to a property o φ of general programs (called *computations*) of type τ . The idea is that such modalities capture all relevant effect-specific behavioural properties of the effects under consideration.

A main contribution of the paper is to give a general framework for defining such effect modalities, applicable across a wide range of algebraic effects. The general setting is that we have a signature Σ of effect operations, which determines the programming language, and a collection O of modalities, which determines the behavioural logic. In order to specify the semantics of the logic, we require each modality to be assigned a set of unit-type effect trees, which determines the meaning of the modality. Several concrete examples and a detailed general explanation are given in Sect. 3.

In Sect. 4, we consider the relation of *behavioural equivalence* between programs determined by the logic. A fundamental well-behavedness property is that any reasonable program equivalence should be a congruence with respect to the syntactic constructs of the programming language. Our main theorem (Theorem 1) is that, under two conditions on the collection O of modalities, which hold for all the examples of effects we consider, the logically induced behavioural equivalence is indeed a congruence.

In order to prove Theorem 1, we develop an alternative perspective on behavioural equivalence, which is of interest in its own right. In Sect. 5 we show how the modalities O determine a relation of *applicative* O*-bisimilarity*, which is an effect-sensitive version of Abramsky's notion of *applicative bisimilarity* [1]. Theorem 2 shows that applicative O-bisimilarity coincides with the logically defined relation of behavioural equivalence.

The proof of Theorem 1 is then concluded in Sect. 6, where we use Howe's method [5,6] to show that applicative O-bisimilarity is a congruence. Although the proof is technically involved, we give only a brief outline, as the details closely follow the recent paper [9], in which Howe's method is applied to an untyped language with general algebraic effects.

In Sect. 7, we present a variation on our behavioural logic, in which we make the syntax of logical formulas independent of the syntax of the programming language.

Finally, in Sect. 8 we discuss related and further work.

### **2 A Simple Programming Language**

As motivated in the introduction, our chosen base language is a simply-typed call-by-value functional language with general recursion and a ground type of natural numbers, to which we add (algebraic) effect-triggering operations. This means that our language is a call-by-value variant of PCF [20], extended with algebraic effects, resulting in a language similar to the one considered in [21]. In order to simplify the technical treatment of the language, we present it in the style of *fine-grained call-by-value* [13]. This means that we make a syntactic distinction between *values* and *computations*, representing the static and dynamic aspects of the language respectively. Furthermore, all *sequencing* of computations is performed using a single language construct, the **let** construct. The resulting language is straightforwardly intertranslatable with the more traditional call-by-value formulation. But the encapsulation of all sequencing within a single construct has the benefit of avoiding redundancy in proofs.

Our types are just the simple types obtained by iterating the function type construction over two base types: **N** of natural numbers, and also a unit type **1**.

**Types**: τ, ρ ::= **1** | **N** | ρ → τ **Contexts**: Γ ::= ∅ | Γ, x : τ

As usual, term variables x are taken from a countably-infinite stock of such variables, and the context Γ, x : τ can only be formed if the variable x does not already appear in Γ.

As discussed above, program terms are separated into two mutually defined but disjoint categories: *values* and *computations*.

**Values**: V,W ::= ∗ | Z | S(V ) | λx.M | x **Computations**: M,N ::= V W | **return** V | **let** M ⇒ x **in** N | **fix** (V ) | **case** V **in** {Z ⇒ M,S(x) ⇒ N}

Here, ∗ is the unique value of the unit type. The values of the type of natural numbers are the *numerals* represented using zero Z and successor S. The values of function type are the λ-abstractions. And a variable x can be considered a value, because, under the call-by-value evaluation strategy of the language, it can only be instantiated with a value.

The computations are: function application V W; the computation that does nothing but return a value V ; a **let** construct for sequencing; a **fix** construct for recursive definition; and a **case** construct that branches according to whether its natural-number argument is zero or positive. The computation **let** M ⇒ x **in** N implements sequencing in the following sense. First the computation M is evaluated. Only in the case that the evaluation of M terminates, with return value V , does the thread of execution continue to N. In this case, the computation N[V /x] is evaluated, and its return value (if any) is the one returned by the **let** construct.

To the pure functional language described above, we add *effect operations*. The collection of effect operations is specified by a set Σ (the *signature*) of such operations, together with, for each σ ∈ Σ an associated *arity* which takes one of the four forms below

$$
\alpha^n \to \alpha \qquad \mathbf{N} \times \alpha^n \to \alpha \qquad \alpha^\mathbf{N} \to \alpha \qquad \mathbf{N} \times \alpha^\mathbf{N} \to \alpha.
$$

The notation here is chosen to be suggestive of the way in which such arities are used in the typing rules below, viewing α as a type variable. Each of the forms of arity has an associated term constructor, for building additional computation terms, with which we extend the above grammar for computation terms.

**Effects**: σ(M0, M1,...,M<sup>n</sup>−<sup>1</sup>) | σ(V ; M0, M1,...,M<sup>n</sup>−<sup>1</sup>) | σ(V ) | σ(W; V )

Motivating examples of effect operations and their computation terms can be found in Examples 0–5 below.

The typing rules for the language are given in Fig. 1 below. Note that the choice of typing rule for an effect operation σ ∈ Σ depends on its declared arity.

The terms of type τ are the values and computations generated by the constructors above. Every term has a unique *aspect* as either a value or computation. We write *Val*(τ ) and *Com*(τ ) respectively for closed values and computations. So the closed terms of <sup>τ</sup> are *Term*(<sup>τ</sup> ) = *Val*(<sup>τ</sup> ) <sup>∪</sup> *Com*(<sup>τ</sup> ). For <sup>n</sup> <sup>∈</sup> <sup>N</sup> a natural number, we write <sup>n</sup> for the numeral <sup>S</sup><sup>n</sup>(Z), hence V al(**N**) := {<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>}.

We now consider some standard signatures of computationally interesting effect operations, which will be used as running examples throughout the paper. (We use the same examples as in [8].)

*Example 0 (Pure functional computation).* This is the trivial case (from an effect point of view) in which the signature Σ of effect operations is empty. The resulting language is a call-by-value variant of PCF [20].

**Fig. 1.** Typing rules

*Example 1 (Error).* We take a set of error labels E. For each e ∈ E there is an effect operator *raise*<sup>e</sup> : <sup>α</sup><sup>0</sup> <sup>→</sup> <sup>α</sup> which, when invoked by the computation *raise*e(), aborts evaluation and outputs e as an error message.

*Example 2 (Nondeterminism).* There is a binary choice operator *or* : <sup>α</sup><sup>2</sup> <sup>→</sup> <sup>α</sup> which gives two options for continuing the computation. The choice of continuation is under the control of some external agent, which one may wish to model as being cooperative (*angelic*), antagonistic (*demonic*), or *neutral*.

*Example 3 (Probabilistic choice).* Again there is a single binary choice operator *p-or* : <sup>α</sup><sup>2</sup> <sup>→</sup> <sup>α</sup> which gives two options for continuing the computation. In this case, the choice of continuation is probabilistic, with a <sup>1</sup> <sup>2</sup> probability of either option being chosen. Other weighted probabilistic choices can be programmed in terms of this fair choice operation.

*Example 4 (Global store).* We take a set of locations L for storing natural numbers. For each <sup>l</sup> <sup>∈</sup> <sup>L</sup> we have *lookup*<sup>l</sup> : <sup>α</sup>**<sup>N</sup>** <sup>→</sup> <sup>α</sup> and *update*<sup>l</sup> : **<sup>N</sup>** <sup>×</sup> <sup>α</sup> <sup>→</sup> <sup>α</sup>. The computation *lookup*l(V ) looks up the number at location l and passes it as an argument to the function V , and *update*l(n; M) stores n at l and then continues with the computation M.

*Example 5 (Input/output).* Here we have two operators, *read* : <sup>α</sup>**<sup>N</sup>** <sup>→</sup> <sup>α</sup> which reads a number from an input channel and passes it as the argument to a function, and *write* : **N** × α → α which outputs a number (the first argument) and then continues as the computation given as the second argument.

We next present an operational semantics for our language, under which a computation term evaluates to an *effect tree*: essentially, a coinductively generated term using operations from Σ, and with values and ⊥ (nontermination) as the generators. This idea appears in [8,21], and our technical treatment follows approach of the latter, adapted to call-by-value.

We define a single-step reduction relation between configurations (S, M) consisting of a stack S and a computation M. The computation M is the term under current evaluation. The stack S represents a continuation computation awaiting the termination of M. First, we define a stack-independent reduction relation on computation terms that do not involve **let** at the top level.

(λx : τ.M)V M[V /x] **case** <sup>Z</sup> **of** {<sup>Z</sup> <sup>⇒</sup> <sup>M</sup>1; <sup>S</sup>(x) <sup>⇒</sup> <sup>M</sup>2} <sup>M</sup><sup>1</sup> **case** <sup>S</sup>(<sup>V</sup> ) **of** {<sup>Z</sup> <sup>⇒</sup> <sup>M</sup>1; <sup>S</sup>(x) <sup>⇒</sup> <sup>M</sup>2} <sup>M</sup>2[V /x] **fix**(F) **return** λx : τ. **let** <sup>F</sup>(λy : τ.**let fix** <sup>F</sup> <sup>⇒</sup> <sup>z</sup> **in** zy) <sup>⇒</sup> <sup>w</sup> **in** wx

The behaviour of **let** is implemented using a system of stacks where:

$$\mathbf{Stacks } S ::= \begin{array}{c} id \ \mid \ S \circ (\mathbf{let} \ (-) \Rightarrow x \ \mathbf{in} \ M) \end{array}$$

We write S{N} for the computation term obtained by 'applying' the stack S to N, defined by:

$$\begin{aligned} \operatorname{id}\{N\} &= N\\ \{S \circ (\det \begin{array}{c} (-) \Rightarrow x \quad \text{in} \quad M \end{array} \} &= S \{ \begin{array}{c} \text{let } N \Rightarrow x \quad \text{in} \quad M \} \end{array} \} \end{aligned}$$

We write *Stack*(τ, ρ) for the set of stacks S such that for any N ∈ *Com*(τ ), it holds that S{N} is a well-typed expression of type ρ. We define a reduction relation on pairs *Stack*(τ, ρ) × *Com*(τ ) (denoted (S1, M1) -(S2, M2)) by:

$$\begin{aligned} (S, \begin{array}{c} \textbf{let} \ N \Rightarrow x \ \textbf{in} \ M \end{array}) & \longmapsto (S \circ (\textbf{let} \ (-) \Rightarrow x \ \textbf{in} \ M), N) \\ (S, R) & \longmapsto (S, R') \\ (S \circ (\textbf{let} \ (-) \Rightarrow x \ \textbf{in} \ M), \textbf{return} \ V) & \longmapsto (S, M[V/x]) \end{aligned}$$

We define the notion of *effect tree* for an arbitrary set X, where X is thought of as a set of abstract 'values'.

**Definition 1.** An *effect tree* (henceforth *tree*), over a set X, determined by a signature Σ of effect operations, is a labelled and possibly infinite tree whose nodes have the possible forms.


We write T X for the set of trees over X. We define a partial ordering on T X where t<sup>1</sup> ≤ t2, if t<sup>1</sup> can be obtained by replacing subtrees of t<sup>2</sup> by ⊥. This forms an ω*-complete* partial order, meaning that every ascending sequence t<sup>1</sup> ≤ t<sup>2</sup> ≤ ... has a least upper bound - <sup>n</sup> tn. Let *Tree*(τ ) := T*Val*(τ ), we will define a reduction relation from computations to trees of values.

Given f : X → Y and a tree t ∈ T X, we write t[x → f(x)] ∈ T Y for the tree whose leaves x ∈ X are renamed to f(x). We have a function μ : TTX → T X, which takes a tree r of trees and flattens it to a tree μr ∈ T X, by taking the labelling tree at each non-⊥ leaf of r as the subtree at the corresponding node in μr. The function μ is the multiplication associated with the monad structure of the T operation. The unit of the monad is the map η : X → T X which takes an element x ∈ X and returns a leaf labelled x.

The operational mapping from a computation M ∈ *Com*(τ ) to an effect tree is defined intuitively as follows. Start evaluating the M in the empty stack *id*, until the evaluation process (which is deterministic) terminates (if this never happens the tree is ⊥). If the evaluation process terminates at a configuration of the form (*id*, **return** V ) then the tree is the leaf V . Otherwise the evaluation process can only terminate at a configuration of the form (S, σ(...)) for some effect operation σ ∈ Σ. In this case, create an internal node in the tree of the appropriate kind (depending on σ) and continue generating each child tree of this node by repeating the above process by evaluating an appropriate continuation computation, starting from a configuration with the current stack S.

The following (somewhat technical) definition formalises the idea outlined above in a mathematically concise way. We define a family of maps |−, −|(−) : *Stack*(τ, ρ) <sup>×</sup> *Com*(<sup>τ</sup> ) <sup>×</sup> <sup>N</sup> <sup>→</sup> *Tree*(ρ) indexed over τ, and <sup>ρ</sup> by:

$$|S,M|\_{n+1} = \begin{cases} V & \text{if } S = id \land M = \mathtt{return}\, V\\ |S',M'|\_n & \text{if } (S,M) \mapsto (S',M')\\ \sigma(|S,M\_0|\_n,\dots,|S,M\_{m-1}|\_n) & \sigma: \alpha^m \to \alpha, M = \sigma(M\_0,\dots,M\_{m-1})\\ \sigma(|S,V\overline{0}|\_n,|S,V\overline{1}|\_n,\dots) & \sigma: \alpha^{\overline{N}} \to \alpha, M = \sigma(V)\\ \sigma\_k(|S,M\_0|\_n,\dots,|S,M\_{m-1}|\_n) & \sigma: \mathbf{N}\circ \alpha^m \to \alpha, M = \sigma(\overline{k},M\_0,\dots,M\_{m-1})\\ \sigma\_k(|S,V\overline{0}|\_n,|S,V\overline{1}|\_n,\dots) & \sigma: \mathbf{N}\circ \alpha^{\overline{N}} \to \alpha, M = \sigma(\overline{k},V)\\ \bot & \text{otherwise} \end{cases}$$

It follows that |S, M|<sup>n</sup> ≤ |S, M|n+1 in the given ordering on trees. We write |−|(−) : *Com*(<sup>τ</sup> ) <sup>×</sup> <sup>N</sup> <sup>→</sup> *Tree*(<sup>τ</sup> ) for the function defined by <sup>|</sup>M|<sup>n</sup> <sup>=</sup> <sup>|</sup>*id*, M|n. Using this we can give the operational interpretation of computation terms as effect trees by defining |−| : *Com*(τ ) → *Tree*(τ ) by |M| := - <sup>n</sup> |M|n.

*Example 3 (Nondeterminism).* Nondeterministically generate a natural number: ?N := **let fix**(λx : **1** → **N**. *or*(λy : **1**. Z, λy : **1**. **let** xy ⇒ z **in** S(z))) ⇒ w **in** w∗

### **3 Behavioural Logic and Modalities**

The goal of this section is to motivate and formulate a logic for expressing *behavioural properties* of programs. In our language, program means (well-typed) term, and we shall be interested both in properties of *computations* and in properties of *values*. Accordingly, we define a logic that contains both *value formulas* and *computation formulas*. We shall use lower case Greek letters φ, ψ, . . . for the former, and upper case Greek letters Φ, Ψ, . . . for the latter. Our logic will thus have two satisfaction relations

$$\begin{array}{ccccc} V & = & \phi & & & \\ \end{array}$$

which respectively assert that "value V enjoys the value property expressed by φ" and "computation M enjoys the computation property expressed by Φ".

In order to motivate the detailed formulation of the logic, it is useful to identify criteria that will guide the design.


For every type τ , we define a collection *VF*(τ ) of *value formulas*, and a collection *CF*(τ ) of *computation formulas*, as motivated above.

Since boolean logical connectives say nothing themselves about computational behaviour, it is a reasonable general principle that 'behavioural properties' should be closed under such connectives. Thus, in keeping with criterion (C2), which asks for maximal expressivity, we close each set *CF*(τ ) and *VF*(τ ), of computation and value formulas, under infinitary propositional logic.

In addition to closure under infinitary propositional logic, each set *VF*(τ ) contains a collection of *basic* value formulas, from which compound formulas are constructed using (infinitary) propositional connectives.<sup>1</sup> The choice of basic formulas depends on the type τ .

<sup>1</sup> We call such formulas *basic* rather than *atomic* because they include formulas such as (<sup>V</sup> → <sup>Φ</sup>), discussed below, which are built from other formulas.

In the case of the natural numbers type, we include a basic value formula {n} ∈ *VF*(**N**), for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>. The semantics of this formula are given by:

$$V \; \mid = \; \{n\} \quad \Leftrightarrow \quad V = \overline{n}.$$

By the closure of *VF*(**N**) under infinitary disjunctions, every subset of N can be represented by some value formula. Moreover, since a general value formula in *VF*(**N**) is an infinitary boolean combination of basic formulas of the form {n}, the value formulas represent exactly the subsets on N.

For the unit type, we do not require any basic value formulas. The unit type has only one value, ∗. The two subsets of this singleton set of values are defined by the formulas ⊥ ('falsum', given as an empty disjunction), and (the truth constant, given as an empty conjunction).

For a function type τ → ρ, we want each basic formula to express a fundamental behavioural constraint on values (i.e., λ-abstractions) W of type τ → ρ. In keeping with the applicative nature of functional programming, the only way in which a λ-abstraction can be used to generate behaviour is to apply it to an argument of type τ , which, because we are in a call-by-value setting, must be a value V . The application of W to V results in a computation W V of type ρ, whose properties can be probed using computation formulas in *CF*(ρ). Based on this, for every value V ∈ *Val*(τ ) and computation formula Φ ∈ *CF*(ρ), we include a basic value formula (V → Φ) ∈ *VF*(τ → ρ) with the semantics:

$$W \mid = \ (V \mapsto \Phi) \quad \Leftrightarrow \quad WV \mid = \Phi.$$

Using this simple construct, based on application to a single argument V , other natural mechanisms for expressing properties of λ-abstractions are definable, using infinitary propositional logic. For example, given φ ∈ *VF*(τ ) and Ψ ∈ *CF*(ρ), the definition

$$\{ (\phi \mapsto \Psi) \; := \; \bigwedge \{ (V \mapsto \Psi) \; | \; V \in \operatorname{Val}(\tau), \; V \; \mid = \; \phi \} \tag{1}$$

defines a formula whose derived semantics is

$$W \mid \vdash (\phi \mapsto \Psi) \quad \Leftrightarrow \quad \forall V \in Val(\tau). \ V \mid \vdash \phi \text{ implies } WV \mid \vdash \Psi. \tag{2}$$

In Sect. 7, we shall consider the possibility of changing the basic value formulas in *VF*(τ → ρ) to formulas (φ → Ψ).

It remains to explain how the basic computation formulas in *CF*(τ ) are formed. For this we require a given set O of *modalities*, which depends on the algebraic effects contained in the language. The basic computation formulas in *CF*(τ ) then have the form o φ, where o ∈ O is one of the available modalities, and φ is a value formula in *VF*(τ ). Thus a modality 'lifts' properties of values of type τ to properties of computations of type τ .

In order to give semantics to computation formulas o φ, we need a general theory of the kind of modality under consideration. This is one of the main contributions of the paper. Before presenting the general theory, we first consider motivating examples, using our running examples of algebraic effects.

*Example 0 (Pure functional computation).* Define O = {↓}. Here the single modality ↓ is the *termination modality*: ↓φ asserts that a computation terminates with a return value V satisfying φ. This is formalised using effect trees:

M |= ↓φ ⇔ |M| is a leaf V and V |= φ.

Note that, in the case of pure functional computation, all trees are leaves: either value leaves V , or nontermination leaves ⊥.

*Example 1 (Error).* Define O = {↓} ∪ {E<sup>e</sup> | e ∈ E}. The semantics of the termination modality ↓ is defined as above. The *error modality* E<sup>e</sup> flags error e:

M |= Eeφ ⇔ |M| is a node labelled with *raise*e.

(Because *raise*<sup>e</sup> is an operation of arity 0, a *raise*<sup>e</sup> node in a tree has 0 children.) Note that the semantics of Eeφ makes no reference to φ. Indeed it would be natural to consider E<sup>e</sup> as a basic computation formula in its own right, which could be done by introducing a notion of 0-argument modality, and considering E<sup>e</sup> as such. In this paper, however, we keep the treatment uniform by always considering modalities as unary operations, with natural 0-argument modalities subsumed as unary modalities with redundant argument.

*Example 2 (Nondeterminism).* Define O = {♦, } with:

M |= ♦φ ⇔ |M| has some leaf V such that V |= φ M |= φ ⇔ |M| has finite height and every leaf is a value V s.t. V |= φ.

Including both modalities amounts to a neutral view of nondeterminism. In the case of angelic nondeterminism, one would include just the ♦ modality; in that of demonic nondeterminism, just the modality. Because of the way the semantic definitions interact with termination, the modalities and ♦ are not De Morgan duals. Indeed, each of the three possibilities {♦, }, {♦}, {} for O leads to a logic with a different expressivity.

*Example 3 (Probabilistic choice).* Define <sup>O</sup> <sup>=</sup> {P>q <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, <sup>0</sup> <sup>≤</sup> q < <sup>1</sup>} with:

$$M \Vdash \mathbf{P}\_{\geq q} \phi \quad \Leftrightarrow \quad \mathbf{P}(|M| \text{ terminates with a value in } \{V \mid V \vdash \phi\}) > q,$$

where the probability on the right is the probability that a run through the tree |M|, starting at the root, and making an independent fair probabilistic choice at each branching node, terminates at a value node with a value V in the set {V | V |= φ}. We observe that the restriction to rational thresholds q is immaterial, as, for any real r with 0 ≤ r < 1, we can define:

$$\mathbb{P}\_{>r}\phi \;:=\quad\bigvee\{\mathbb{P}\_{>q}\phi \mid q\in\mathbb{Q}, \; r$$

Similarly, we can define non-strict threshold modalities, for 0 < r ≤ 1, by:

$$\mathsf{P}\_{\geq r} \phi \; := \bigwedge \{ \mathsf{P}\_{>q} \phi \mid q \in \mathsf{Q}, \, 0 \leq q < r \}.$$

Also, we can exploit negation to define modalities expressing strict and non-strict upper bounds on probabilities. Notwithstanding the definability of non-strict and upper-bound thresholds, we shall see later that it is important that we include only strict lower-bound modalities in our set O of primitive modalities.

*Example 4 (Global store).* For a set of locations L, define the set of states by *State* <sup>=</sup> <sup>N</sup>L. The modalities are <sup>O</sup> <sup>=</sup> {(<sup>s</sup> r) | s, r ∈ *State*}, where informally:

M |= (s r) φ ⇔ the execution of M, starting in state s, terminates in final state r with return value V such that V |= φ.

We make the above definition precise using the effect tree of M. Define

$$exec: TX \times State \to X \times State,$$

for any set X, to be the least partial function satisfying:

$$\text{exec}(t, s) \quad = \begin{cases} (x, s) & \text{if } t \text{ is a leaf labelled with } x \in X \\ \text{exec}(t\_{s(l)}, s) & \text{if } t = \text{lookup}(t\_0, t\_1, \cdot) \text{ and } \text{exec}(t\_{s(l)}, s) \text{is defined} \\ \text{exec}(t', s[l:= n]) & \text{if } t = \text{update}\_{l, n}(t') \text{ and } \text{exec}(t', s[l:= n]) \text{ is defined}, \end{cases}$$

where s[l := n] is the evident modification of state s. Intuitively, *exec*(t, s) defines the result of "executing" the tree of commands in effect tree t starting in state s, whenever this execution terminates. In terms of operational semantics, it can be viewed as defining a 'big-step' semantics for effect trees (in the signature of global store). We can now define the semantics of the (s r) modality formally:

$$M \mid = \ (s \mapsto r) \phi \quad \Leftrightarrow \quad \text{exec}(|M|, s) = (V, r) \text{ where } V \mid = \ \phi.$$

*Example 5 (Input/output).* Define an *i/o-trace* to be a word w over the alphabet

$$\{?n \mid n \in \mathbb{N}\} \cup \{!n \mid n \in \mathbb{N}\}.$$

The idea is that such a word represents an input/output sequence, where ?n means the number n is given in response to an input prompt, and !n means that the program outputs n. Define the set of modalities

$$\mathcal{O} = \{ \langle w \rangle \downarrow, \langle w \rangle \downarrow \mid w \text{ an i/o-trace} \}.$$

The intuitive semantics of these modalities is as follows.

M |= w↓ φ ⇔ w is a complete i/o-trace for the execution of M resulting in termination with V s.t. V |= φ M |= w... φ ⇔ w is an initial i/o-trace for the execution of M.

In order to define the semantics of formulas precisely, we first define relations t |= w↓ P and t |= w..., between t ∈ T X and P ⊆ X, by induction on words

$$\frac{n \in \mathbb{N}}{\{n\} \in V\mathcal{F}(\mathbb{N})} \langle 1 \rangle \qquad \frac{V: \tau \quad \Phi \in CF(\rho)}{(V \mapsto \Phi) \in VF(\tau \to \rho)} \langle 2 \rangle \qquad \frac{\phi \in VF(\tau) \quad o \in \mathcal{O}}{o \,\phi \in CF(\tau)} \langle 3 \rangle$$

$$\frac{\phi: I \to VF(\tau)}{\bigvee\_{I} \,\phi \in VF(\tau)} (4) \qquad \frac{\phi: I \to VF(\tau)}{\bigwedge\_{I} \,\phi \in VF(\tau)} (5) \qquad \frac{\phi \in VF(\tau)}{\neg \phi \in VF(\tau)} (6)$$

$$\frac{\Phi: I \to CF(\tau)}{\bigvee\_{I} \,\Phi \in CF(\tau)} (7) \qquad \frac{\Phi: I \to CF(\tau)}{\bigwedge\_{I} \,\Phi \in CF(\tau)} (8) \qquad \frac{\Phi \in CF(\tau)}{\neg \Phi \in CF(\tau)} (9)$$

**Fig. 2.** The logic V

(Note that we are overloading the |= symbol.) In the following, we write ε for the empty word, and we use textual juxtaposition for concatenation of words.

$$\begin{aligned} t &\equiv \langle \varepsilon \rangle \downarrow P \quad \Leftrightarrow \quad t \text{ is a leaf } x \text{ and } x \in P\\ t &\equiv \langle (?n) \, w \rangle \downarrow P \quad \Leftrightarrow \quad t = \operatorname{read}(t\_0, t\_1, \dots) \text{ and } t\_n \mid = \langle w \rangle \downarrow P\\ t &\equiv \langle (!n) \, w \rangle \downarrow P \quad \Leftrightarrow \quad t = \operatorname{write}(t') \text{ and } t' \mid = \langle w \rangle \downarrow P\\ t &\equiv \langle \varepsilon \rangle \ldots \quad \Leftrightarrow \quad \text{true}\\ t &\equiv \langle (?n) \, w \rangle \downarrow \quad \Leftrightarrow \quad t = \operatorname{read}(t\_0, t\_1, \dots) \text{ and } t\_n \mid = \langle w \rangle \ldots\\ t &\equiv \langle (!n) \, w \rangle \ldots \quad \Leftrightarrow \quad t = \operatorname{write}(t') \text{ and } t' \mid = \langle w \rangle \ldots \end{aligned}$$

The formal semantics of modalities is now easily defined by:

$$\begin{aligned} M &= \langle w \rangle \downarrow \phi &\Leftrightarrow & |M| &= \langle w \rangle \downarrow \{ V \mid V \mid = \phi \}, \\ M &= \langle w \rangle \_\dots \phi &\Leftrightarrow & |M| &= \langle w \rangle \_\dots. \end{aligned}$$

Note that, as in Example 1, the formula argument of the w... modality is redundant. Also, note that our modalities for input/output could naturally be formed by combining the termination modality ↓, which lifts value formulas to computation formulas, with sequences of atomic modalities ?n and !n acting directly on computation formulas. In this paper, we do not include such modalities, acting on computation formulas, in our general theory. But this is a natural avenue for future consideration.

We now give a formal treatment of the logic and its semantics, in full generality. We assume given a signature Σ of effect operations, as in Sect. 2. And we assume given a set O, whose elements we call *modalities*.

We call our main behavioural logic V, where the letter V is chosen as a reference to the fact that the basic formula at function type specifies function behaviour on individual value arguments V .

**Definition 2 (The logic** V**).** The classes *VF*(τ ) and *CF*(τ ) of *value* and *computation formulas*, for each type τ , are mutually inductively defined by the rules in Fig. 2. In this, I can be instantiated to any set, allowing for arbitrary conjunctions and disjunctions. When I is ∅, we get the special formulas = <sup>∅</sup> and ⊥ = <sup>∅</sup>. The use of arbitrary index sets means that formulas, as defined, form a proper class. However, we shall see below that countable index sets suffice.

In order to specify the semantics of modal formulas, we require a connection between modalities and effect trees, which is given by an interpretation function

$$[\cdot]: \mathcal{O} \to \mathcal{P}(T\mathbf{1}).$$

That is, every modality o ∈ O is mapped to a subset o ⊆ T**1** of unit-type effect trees. Given a subset P ⊆ X (e.g. given by a formula) and a tree t ∈ T X we can define a unit-type tree t[∈P] ∈ T**1** as the tree created by replacing the leaves of t that belong to P by ∗ and the others by ⊥. In the case that P is the subset {V | V |= φ} specified by a formula φ ∈ *VF*(τ ), we also write t[ |= φ] for t[∈P].

We can now formally define the two satisfaction relations |= ⊆ *Val*(τ )×*VF*(τ ) and |= ⊆ *Com*(τ ) × *CF*(τ ), mutually inductively, by:

$$\begin{aligned} \overline{m} &\vDash \!\!/ &\Leftrightarrow & m &= n\\ W &\vDash (V \mapsto \Phi) &\Leftrightarrow & WV &\mid = \Phi\\ M &\vDash \!\/ o\phi &\Leftrightarrow & \!\mid & M \,\mid \mid = \ \phi\rangle \in \lbracko\}\\ W &\vDash \!\!/ &\Leftrightarrow & \neg(W &\mid = \phi). \end{aligned}$$

We omit the evident clauses for the other propositional connectives. We remark that all conjunctions and disjunctions are semantically equivalent to countable ones, because value and computation formulas are interpreted over sets of terms, *Val*(τ ) and *Com*(τ ), which are countable.

We end this section by revisiting our running examples, and showing, in each case, that the example modalities presented above are all specified by suitable interpretation functions -· : O→P(T**1**).

*Example 0 (Pure functional computation).* We have O = {↓}. Define:


*Example 1 (Error).* We have O = {↓} ∪ {E<sup>e</sup> | e ∈ E}. Define:

$$\left[\mathsf{E}\_{e}\right] \;= \; \left\{\,\,raise\right\}.$$

*Example 2 (Nondeterminism).* We have O = {♦, }. Define:



*Example 3 (Probabilistic choice).* <sup>O</sup> <sup>=</sup> {P>q <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, <sup>0</sup> <sup>≤</sup> q < <sup>1</sup>}. Define:


*Example 4 (Global store).* O = {(s r) | s, r ∈ *State*}. Define:

$$\left\lbrack \left[ (s \mapsto r) \right] \right\rbrack = \left\{ t \mid \operatorname{exec}(t, s) = (\*, r) \right\}.$$

*Example 5 (Input/output).* O = {w↓, w... | w an i/o-trace}. Define:

$$\begin{aligned} \left[ \langle w \rangle \downarrow \right] &= \{ t \mid t \mid = \langle w \rangle \downarrow \{ \* \} \}, \\ \left[ \langle w \rangle \ldots \right] &= \{ t \mid t \mid = \langle w \rangle \ldots \}. \end{aligned}$$

### **4 Behavioural Equivalence**

The goal of this section is to precisely formulate our main theorem: under suitable conditions, the behavioural equivalence determined by the logic V of Sect. 3 is a congruence. In order to achieve this, it will be useful to consider the *positive fragment* <sup>V</sup><sup>+</sup> of <sup>V</sup>.

**Definition 3 (The logic** <sup>V</sup><sup>+</sup>**).** The logic <sup>V</sup><sup>+</sup> is the fragment of <sup>V</sup> consisting of those formulas in *VF*(τ ) and *CF*(τ ) that do not contain negation.

Whenever we have a logic L whose value and computation formulas are given as subcollections *VF*L(τ ) ⊆ *VF*(τ ) and *CF*L(τ ) ⊆ *CF*(τ ), then L determines a preorder (and hence also an equivalence relation) between terms of the same type and aspect.

**Definition 4 (Logical preorder and equivalence).** Given a fragment L of V, we define the *logical preorder* L, between well-typed terms of the same type and aspect, by:

$$\begin{aligned} V \sqsubseteq\_{\mathcal{L}} W & \quad \Leftrightarrow \quad \forall \phi \in V \\ M \sqsubseteq\_{\mathcal{L}} N & \quad \Leftrightarrow \quad \forall \Phi \in C \mathcal{F}\_{\mathcal{L}}(\tau), \; M \; \mid = \; \Phi \Rightarrow \; N \; \mid = \; \Phi \end{aligned}$$

The *logical equivalence* ≡<sup>L</sup> on terms is the equivalence relation induced by the preorder (the intersection of <sup>L</sup> and its converse).

In the case that formulas in L are closed under negation, it is trivial that the preorder <sup>L</sup> is already an equivalence relation, and hence coincides with ≡L. Thus we shall only refer specifically to the preorder L, for fragments, such as <sup>V</sup><sup>+</sup>, that are not closed under negation.

The two main relations of interest to us in this paper are the primary relations determined by <sup>V</sup> and <sup>V</sup><sup>+</sup>: full *behavioural equivalence* <sup>≡</sup><sup>V</sup> ; and the *positive behavioural preorder* V<sup>+</sup> (which induces *positive behavioural equivalence* ≡V<sup>+</sup> ).

We next formulate the appropriate notion of (pre)congruence to apply to the relations ≡<sup>V</sup> and V<sup>+</sup> . These two preorders are examples of *well-typed relations* on closed terms. Any such relation can be extended to a relation on open terms in the following way. Given a well-typed relation R on closed terms, we define the *open extension* R◦ where Γ MR◦N : τ precisely when, for every well-typed vector of closed values −→V : Γ, it holds that M[ −→<sup>V</sup> ] <sup>R</sup> <sup>N</sup>[ −→V ]. The correct notion of precongruence for a well-typed preorder on closed terms, is to ask for its open extension to be *compatible* in the sense of the definition below; see, e.g., [10,19] for further explanation.

**Definition 5 (Compatibility).** A well-typed open relation R is said to be *compatible* if it is closed under the rules in Fig. 3.

We now state our main congruence result, although we have not yet defined the conditions it depends upon.

#### **Fig. 3.** Rules for compatibility

**Theorem 1.** *If* O *is a decomposable set of Scott-open modalities then the open extensions of* ≡<sup>V</sup> *and* V<sup>+</sup> *are both compatible. (It is an immediate consequence that the open extension of* ≡V<sup>+</sup> *is also compatible.)*

The Scott-openness condition refers to the *Scott topology* on T**1**.

**Definition 6.** We say that o ∈ O is *upwards closed* if o is an upper-closed subset of T**1**; i.e., if t ∈ o implies t ∈ o whenever t ≤ t .

**Definition 7.** We say that o ∈ O is *Scott-open* if o is an open subset in the Scott topology on T**1**; i.e., o is upper closed and, whenever t<sup>1</sup> ≤ t<sup>2</sup> ≤ ... is an ascending chain in T**1** with supremum it<sup>i</sup> ∈ o, we have t<sup>n</sup> ∈ o for some n.

Before formulating the property of *decomposability*, we make some simple observations about the positive preorder V<sup>+</sup> .

**Lemma 8.** *For any* V0, V<sup>1</sup> ∈ *Val*(ρ → τ )*, we have* V<sup>0</sup> V<sup>+</sup> V<sup>1</sup> *if and only if:*

∀W ∈ V al(ρ), ∀Ψ ∈ *CF*V<sup>+</sup> (τ ), V<sup>0</sup> |= (W → Ψ) *implies* V<sup>1</sup> |= (W → Ψ).

**Lemma 9.** *For any* M0, M<sup>1</sup> ∈ *Com*(τ )*, we have* M<sup>0</sup> V<sup>+</sup> M<sup>1</sup> *if and only if:*

$$\forall o \in \mathcal{O}, \forall \phi \in VF\_{\mathcal{V}^+}(\tau), \ M\_0 \vdash \ o \phi \ implies \ M\_1 \vdash \ o \phi.$$

Similar characterisations, with appropriate adjustments, hold for behavioural equivalence ≡<sup>V</sup> .

The decomposability property is formulated using an extension of the positive preorder V<sup>+</sup> , at unit type, from a relation on computations to a relation on arbitrary effect trees. Accordingly, we define a preorder on T**1** by:

$$t \preceq t' \quad \Leftrightarrow \quad \forall o \in \mathcal{O}, \ (t \in [o] \Rightarrow t' \in [o]) \land (t[\in \emptyset] \in [o] \Rightarrow t'[\in \emptyset] \in [o]).$$

**Proposition 10.** *For computations* M,N ∈ *Com*(**1**)*, it holds that* |M||N| *if and only if* M V<sup>+</sup> N*.*

*Proof.* The defining condition for |M||N| unwinds to:

$$\forall o \in \mathcal{O}, \ (M \ \mid = \ o \top \text{ implies } N \ \mid = \ o \top) \land \ (M \ \mid = \ o \bot \text{ implies } N \ \mid = \ o \bot).$$

This coincides with M V<sup>+</sup> N by Lemma 9.

We now formulate the required notion of decomposability. We first give the general definition, and then follow it with a related notion of *strong decomposability*, which can be more convenient to establish in examples. Both definitions are unavoidably technical in nature.

For any relation R ⊆ X × Y and subset A ⊆ X, we write R<sup>↑</sup>A for the right set {y ∈ Y | ∃x ∈ A, xRy}. This allows use to easily define our required notion.

**Definition 11 (Decomposability).** We say that O is *decomposable* if, for all r, r ∈ T T**1**, we have:

$$(\forall A \subseteq T \mathbf{1}, \ r[\in A] \preceq r'[\in \preceq^{\uparrow} A]) \quad \Rightarrow \quad \mu r \preceq \mu r'.$$

Corollary 22 in Sect. 5, may help to motivate the formulation of the above property, which might otherwise appear purely technical. The following stronger version of decomposability, which suffices for all examples considered in the paper, is perhaps easier to understand in its own right.

**Definition 12 (Strong decomposability).** We say that O is *strongly decomposable* if, for every r ∈ T T**1** and o ∈ O for which μr ∈ o, there exists a collection {(oi, o <sup>i</sup>)}<sup>i</sup>∈<sup>I</sup> of pairs of modalities such that:

1. ∀i ∈ I, r[∈ o <sup>i</sup>] ∈ oi ; and 2. for every r ∈ T T**1**, ( ∀i ∈ I, r [∈ o <sup>i</sup>] ∈ oi ) implies μr ∈ o.

**Proposition 13.** *If* O *is a strongly decomposable then it is decomposable.*

*Proof.* Suppose that r[∈ A] r [∈ (<sup>↑</sup> A)] holds for every A ⊆ T**1**. Assume that μr ∈ o ∈ O. Then strong decomposability gives a collection {(oi, o <sup>i</sup>)}<sup>I</sup> . By the definition of , for each o <sup>i</sup> we have <sup>↑</sup> o <sup>i</sup> = o <sup>i</sup>. By the initial assumption, r[∈ o <sup>i</sup>] ∈ oi implies r [∈ (<sup>↑</sup> o <sup>i</sup>)] ∈ oi, and hence r [∈ o <sup>i</sup>] ∈ oi. This holds for every i, so by strong decomposability μr ∈ o. We have shown that μr ∈ o implies μr ∈ o. One can prove similarly that μr[∈ ∅] ∈ o implies that μr [∈ ∅] ∈ o by observing that <sup>↑</sup> {x | x[∈ ∅] ∈ o <sup>i</sup>} = {x | x[∈ ∅] ∈ o <sup>i</sup>}. Thus it holds that μr μr and hence O is decomposable.

We end this section by again looking at our running examples, and showing, in each case, that the identified collection O of modalities is Scott-closed (hence upwards closed) and strongly decomposable (hence decomposable). For any of the examples, upwards closure is easily established, so we will not show it here. *Example 0 (Pure functional computation).* We have O = {↓} and -↓ = {∗}. Scott openness holds since if it<sup>i</sup> = ∗ then for some i we must already have t<sup>i</sup> = ∗. It is strongly decomposable since: μr ∈ -↓ ⇔ r[∈ -↓] ∈ -↓, which means r returns a tree t which is a leaf ∗.

*Example 1 (Error).* We have O = {↓} ∪ {E<sup>e</sup> | e ∈ E} and -Ee = { *raise*<sup>e</sup> }. Scott-openness holds for both modalities for the same reason as in the previous example, and its strongly decomposable since:

$$
\mu r \in \mathbb{[\downarrow\downarrow] \quad \Leftrightarrow \quad r[\in \{\downarrow\downarrow\}] \in \mathbb{[\downarrow\downarrow].
$$

Which means r returns a tree t which returns ∗.

$$
\mu r \in \left[\mathsf{E}\_e\right] \quad \Leftrightarrow \quad r[\in \left[\mathsf{E}\_e\right]] \in \left[\mathsf{E}\_e\right] \lor r[\in \left[\mathsf{E}\_e\right]] \in \left[\downarrow\right].
$$

Which means r raises an error, or returns a tree that raises an error.

*Example 2 (Nondeterminism).* We have O = {♦, }. The Scott-openness of -♦ = {t | t has some ∗ leaf} is because if it<sup>i</sup> has a ∗ leaf, then that leaf must already be contained in t<sup>i</sup> for some i. Similarly, if it<sup>i</sup> ∈ then, because - = {t |t has finite height and every leaf is a∗}, the tree it<sup>i</sup> has finitely many leaves and all must be contained in t<sup>i</sup> for some i. Hence t<sup>i</sup> ∈ -. Strong decomposability holds because:

μr ∈ -♦ ⇔ r[∈ -♦] ∈ -♦ and μr ∈ - ⇔ r[∈ -] ∈ -.

The right-hand-side of the former states that r has as a leaf a tree t, which itself has a leaf ∗. That of the latter states that r is finite and all leaves are finite trees t that have only ∗ leaves. The same arguments show that {♦} and {} are also decomposable sets of Scott open modalities.

*Example 3 (Probabilistic choice).* <sup>O</sup> <sup>=</sup> {P>q <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, <sup>0</sup> <sup>≤</sup> q < <sup>1</sup>}. For the Scott-openness of -P>q = {t | **P**(t terminates with a \* leaf ) > q}, note that **P**(it<sup>i</sup> terminates with a ∗ leaf ) is determined by some countable sum over the leaves of ti. If this sum is greater than a rational q, then some finite approximation of the sum must already be above q. The finite sum is over finitely many leaves from iti, all of which will be present in t<sup>i</sup> for some i. Hence t<sup>i</sup> ∈ -P>q.

We have strong decomposability, since **P**( μr terminates with a ∗ leaf ) equals the integral of the function fr(x) = *sup*{y ∈ [0, 1] | r[-P>x] ∈ -P>y} from [0, 1] to [0, 1]. Indeed, fr(x) gives the probability that r return a tree t ∈ -P>x. So we know that if ∀x, y, r[-P>x] ∈ -P>y ⇒ r [-P>x] ∈ -P>y, then f<sup>r</sup>- (x) ≥ fr(x) for any x. Hence if μr ∈ -P>q then f<sup>r</sup> > q, whence also f<sup>r</sup>- > q, which means μr ∈ -P>q.

*Example 4 (Global store).* We have O = {(s s ) | s, s ∈ *State*}. For the Scottopenness of -(s s ) = {t | *exec*(t, s)=(∗, r)}, note that if *exec*(iti, s) = (∗, s ), there is a single finite branch of t that follows the path the recursive function *exec* took. This branch must already be contained in t<sup>i</sup> for some i. We also have strong decomposability since:

$$\mu r \in \mathbb{[s \mapsto s']} \quad \Leftrightarrow \quad \exists s'' \in State, \ r[\in \left[s'' \mapsto s'\right]] \in \left[s \mapsto s''\right].$$

Which just means that *exec*(r, s)=(t, s) and *exec*(t, s)=(∗, s ) for some s. *Example 5 (Input/output).* We have O = {w↓, w... | w an i/o-trace}. For the Scott-openness of w↓ = {t | t |= w↓ {∗} }, note that the i/o-trace w↓ is given by some finite branch, which if in it<sup>i</sup> must be in t<sup>i</sup> for some i. The Scott-openness of w... = {t | t |= w... } holds for similar reasons. We have strong decomposability because of the implications:

μr ∈ w↓ ⇔ ∃v, u i/o-traces, vu = w ∧ r[∈ u↓] ∈ v↓.

Which means r follows trace v returning t, and t follows trace u returning ∗.

μr ∈ w... ⇔ r[∈ -↓] ∈ w... ∨ ∃v, u, vu = w ∧ r[∈ u...] ∈ v↓.

Which means either r follows trace w immediately, or it follows v returning a tree that follows u.

### **5 Applicative** *O***-(bi)similarity**

In this section we look at an alternative description of our logical pre-order. Central to such a definition lies the concept of a *relator* [12,25], which we use to lift a relation on value terms to a relation on computation terms. With our family of modalities O we can define a relator which takes a relation R ⊆ X ×Y and returns the relation O(R) ⊆ T X × T Y , defined by:

$$t\,^t\mathcal{O}(\mathcal{R})\,t' \quad \Leftrightarrow \quad \forall A\subseteq X, \forall o\in\mathcal{O},\,t[\in A]\in [o]\Rightarrow t'[\in(\mathcal{R}^\uparrow A)]\in[o]\,.$$

Note that O(*id***1**)=(). Following [9], we use this relation-lifting operation to define notions of applicative similarity and bisimilarity.

**Definition 14.** An *applicative* <sup>O</sup>*-simulation* is given by a pair of relations <sup>R</sup><sup>v</sup> τ and <sup>R</sup><sup>c</sup> <sup>τ</sup> for each type <sup>τ</sup> , where <sup>R</sup><sup>v</sup> <sup>τ</sup> <sup>⊆</sup> *Val*(<sup>τ</sup> )<sup>2</sup> and <sup>R</sup><sup>c</sup> <sup>τ</sup> <sup>⊆</sup> *Com*(<sup>τ</sup> )<sup>2</sup>, such that:

1. <sup>V</sup> <sup>R</sup><sup>v</sup> **<sup>N</sup>**W ⇒ (V = W) 2. <sup>M</sup>R<sup>c</sup> <sup>τ</sup><sup>N</sup> ⇒ |M| O(R<sup>v</sup> <sup>τ</sup> ) |N| 3. <sup>V</sup> <sup>R</sup><sup>v</sup> <sup>ρ</sup>→<sup>τ</sup><sup>W</sup> ⇒ ∀<sup>U</sup> <sup>∈</sup> *Val*(ρ), VU <sup>R</sup><sup>c</sup> <sup>τ</sup> W U

*Applicative* O*-similarity* is the largest applicative O-simulation, which is equal to the union of all applicative O-simulations.

**Definition 15.** An *applicative* O*-bisimulation* is a symmetric O-simulation. The relation of O*-bisimilarity* is the largest applicative O-bisimulation.

**Lemma 16.** *Applicative* O*-bisimilarity is identical to the relation of applicative* (O∩Oop)*-similarity, where* <sup>t</sup>(O∩Oop)(R)<sup>r</sup> <sup>⇔</sup> <sup>t</sup>O(R)<sup>r</sup> <sup>∧</sup> <sup>r</sup>O(Rop)t*.*

*Proof.* Let <sup>R</sup> be the <sup>O</sup>-bisimilarity, then by symmetry we have <sup>R</sup>op <sup>=</sup> <sup>R</sup>. So if MRN we have NRM, and by the simulation rules we derive |M|O(R)|N| and |N|O(R)|M| which is what we needed.

Let <sup>R</sup> be the O∩Oop-similarity. If <sup>M</sup>Rop<sup>N</sup> then <sup>|</sup>N|(O∩Oop)(R)|M<sup>|</sup> so <sup>|</sup>N|O(R)|M|∧|M|O(Rop)|N<sup>|</sup> which results in <sup>|</sup>M|(O∩Oop)(Rop)|N|. Verifying the other simulation conditions as well, we can conclude that the symmetric closure R∪Rop is also a O∩Oop-simulation. So <sup>R</sup> must, as the largest such simulation, be symmetric. Hence R is a symmetric O-simulation as well.

For brevity, we will leave out the word "applicative" from here on, and write o to mean its denotation o. We also introduce brackets, writing o[φ] for o φ. The key result now is that the maximal relation, the O-similarity is in most cases the same object as our logical preorder. We first give a short Lemma.

**Lemma 17.** *For any fragment* L *of* V *closed under countable conjunction, it holds that for each value* V *there is a formula* χ<sup>V</sup> ∈ L *s.t.* W |=<sup>L</sup> χ<sup>V</sup> ⇔ V <sup>L</sup> W*.*

*Proof.* For each <sup>U</sup> such that (<sup>V</sup> <sup>L</sup> <sup>U</sup>), choose a formula <sup>φ</sup><sup>U</sup> ∈ L such that <sup>V</sup> <sup>|</sup>=<sup>L</sup> <sup>φ</sup><sup>U</sup> and (<sup>U</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup><sup>U</sup> ). Then if we define <sup>χ</sup><sup>V</sup> := {U|V <sup>L</sup>U} <sup>φ</sup><sup>U</sup> it holds that V <sup>L</sup> U ⇔ U |= χ<sup>V</sup> , which is what we want.

**Theorem 2 (a).** *For any family of upwards closed modalities* O*, we have that the logical preorder* V<sup>+</sup> *is identical to* O*-similarity.*

*Proof.* We write instead of V<sup>+</sup> to make room for other annotations. We first prove that our logical preorder is an O-simulation by induction on types.


We can conclude that is an O-simulation. Now take an arbitrary O-simulation R. We prove by induction on types that R ⊆ ().


4. Values of **<sup>1</sup>**. If <sup>V</sup> <sup>R</sup><sup>v</sup> **<sup>1</sup>**<sup>W</sup> then <sup>V</sup> <sup>=</sup> <sup>∗</sup> <sup>=</sup> <sup>W</sup> hence <sup>V</sup> <sup>v</sup> **<sup>1</sup>** W.

In conclusion: any O-simulation R is a subset of the O-simulation V<sup>+</sup> . So V<sup>+</sup> is O-similarity.

Alternatively, we can look at the variation of our logic with negation. This is related to applicative bisimulations.

**Theorem 2 (b).** *For any family of upwards closed modalities* O*, we have that the logical equivalence* ≡<sup>V</sup> *is identical to* O*-bisimilarity.*

*Proof.* Note first that ≡<sup>V</sup> is symmetric.

Secondly, note that since ≡V=<sup>V</sup> we know by Lemma 17, that for any V , there is a formula χ<sup>V</sup> such that W |= χ<sup>V</sup> ⇔ V ≡<sup>V</sup> W.

Using these special formulas χ<sup>V</sup> , the rest of the proof is very similar to the proof in Theorem 2(a). Here follow the non-trivial parts of the proof, different from the previous lemma. For proving ≡<sup>V</sup> is an O-simulation:


So ≡<sup>V</sup> is an O-bisimulation. Now take any O-bisimulation R.


We can conclude that (R) ⊆ (≡<sup>V</sup> ), so ≡<sup>V</sup> is indeed O-bisimilarity.

We end this section by stating the abstract properties of our relational lifting O(R) required for the proof by Howe's method in Sect. 6 to go through. The necessary properties were identified in [9]. The contribution of this paper is that all the required properties follow from our modality-based definition of O(R). The first set of properties tell us that O(−) is a relator in the sense of [12]:

**Lemma 18.** *If the modalities from* O *are upwards closed, then* O(−) *is a relator, meaning that:*


The next property together with the previous lemma establishes that O(−) is a *monotone relator* in the sense of [25].

**Lemma 19.** *If the modalities from* O *are upwards closed, then* O(−) *is* monotone*, meaning for any* f : X → Z*,* g : Y → W*,* R ⊆ X × Y *and* S ⊆ Z × W*:*

(∀x, y, xRy ⇒ f(x) S g(y)) ∧ tO(R)r ⇒ t[x → f(x)] O(S) r[y → g(y)]

The relator also interacts well with the monad structure on T.

**Lemma 20.** *If* O *is a decomposable set of upwards closed modalities, then:*

*1.* xRy ⇒ η(x)O(R)η(y)*; 2.* tO(O(R))r ⇒ μtO(R)μr*.*

Finally, the following properties show that relator behaves well with respect to the order on trees.

**Lemma 21.** *If* O *only contains Scott open modalities, then:*


The lemmas above list the core properties of the relator, which are satisfied when our family O is decomposable and contains only Scott open modalities. The results below follow from those above.

**Corollary 22.** *If* O *contains only upwards closed modalities, then:*

O *is decomposable* ⇔ ∀R ⊆ X ×Y, ∀t, r ∈ T T**1**,(tO(O(R))r ⇒ μt O(R) μr)

**Corollary 23.** *If* O *is a decomposable family of upwards closed modalities, then lifted relations are preserved by Kleisli lifting and effect operators:*

*1. Given* f : X → Z*,* g : Y → W*,* R ⊆ X × Y *and* S ⊆ Z × W*, if for all* x ∈ X *and* y ∈ Y *we have* xRy ⇒ f(x) O(S) g(y)) *and if* tO(R)r *then* μ(t[x → f(x)]) O(S) μ(r[y → g(y)]) *2.* (∀k, ukO(S)vk) ⇒ σ(u0, u1,...)O(S)σ(v0, v1,...)

Point 2 of Corollary 23 has been stated in such a way that it contains both the infinite arity case <sup>α</sup>**<sup>N</sup>** <sup>→</sup> <sup>α</sup> and the finite arity case <sup>α</sup><sup>n</sup> <sup>→</sup> <sup>α</sup>. So it states that any lifted relation is preserved under any of the predefined algebraic effects.

### **6 Howe's Method**

In this section, we apply Howe's method, first developed in [5,6], to establish the compatibility of applicative (bi)similarity, and hence of the behavioural preorders. Given a relation R on terms, one defines its *Howe closure* R•, which is compatible and contains the open extension R◦. Our proof makes fundamental use of the relator properties from Sect. 5, closely following the approach of [9].

**Proposition 24.** *If* O *is a decomposable set of Scott open modalities, then for any* O*-simulation preorder , the restriction of its Howe closure* • *to closed terms is an* O*-simulation.*

In the proof of the proposition, the relator properties are mainly used to show that • satisfies condition (2) in Definition 14.

We can now establish the compatibility of applicative O-similarity.

**Theorem 3 (a).** *If* O *is a decomposable set of Scott open modalities, then the open extension of the relation of* O*-similarity is compatible.*

*Proof (sketch).* We write <sup>s</sup> for the relation of O-similarity. Since <sup>s</sup> is an Osimulation, we know by Proposition 24 that • <sup>s</sup> limited to closed terms is one as well, and hence is contained in the largest O-simulation s. Since • <sup>s</sup> is compatible, it is contained in the open extension ◦ <sup>s</sup>. We can conclude that ◦ s is equal to the Howe closure • <sup>s</sup>, which is compatible.

To prove that O-bisimilarity is compatible, we use the following result from [10] (where we write S<sup>∗</sup> for the transitive-reflexive closure of a relation S).

**Lemma 25.** *If* R◦ *is symmetric and reflexive, then* R•∗ *is symmetric.*

**Theorem 3 (b).** *If* O *is a decomposable set of Scott open modalities, then the open extension of the relation of* O*-bisimilarity is compatible.*

*Proof (sketch).* We write O-bisimilarity as b. From Proposition 24 we know that • <sup>b</sup> on closed terms is an O-simulation, and so we know •∗ <sup>b</sup> is an O-simulation as well (using Lemma 18). Since <sup>b</sup> is reflexive and symmetric, we know by the previous lemma that •∗ <sup>b</sup> is symmetric. Hence •∗ <sup>b</sup> is an O-bisimulation, implying (•∗ <sup>b</sup> ) ⊆ (◦ <sup>b</sup> ) by compatibility of •∗ <sup>b</sup> . Since (◦ <sup>b</sup> ) ⊆ (• <sup>b</sup> ) ⊆ (•∗ <sup>b</sup> ) we have that (•∗ <sup>b</sup> )=(◦ <sup>b</sup> ), and we can conclude that ◦ <sup>b</sup> is compatible.

Theorem 1 is an immediate consequence of Theorems 2 and 3.

### **7 Pure Behavioural Logic**

In this section, we briefly explore an alternative formulation of our logic. This has both conceptual and practical motivations. Our very approach to behavioural logic, fits into the category of *endogenous* logics in the sense of Pnueli [24]. Formulas (φ and Φ) express properties of individual programs, through satisfaction relations (V |= φ and M |= Φ). Programs are thus considered as 'models' of the logic, with the satisfaction relation being defined via program behaviour.

It is conceptually appealing to push the separation between program and logic to its natural conclusion, and ask for the syntax of the logic to be independent of the syntax of the programming language. Indeed, it seems natural that it should be possible to express properties of program behaviour without knowledge of the syntax of the programming language. Under our formulation of the logic V, this desideratum is violated by the value formula (V → Ψ) at function type, which mentions the programming language value V .

This issue can be addressed, by replacing the basic value formula (V → Ψ) with the alternative (φ → Ψ), already mentioned in Sect. 3. Such a change also has a practical motivation. The formula (φ → Ψ) declares a precondition and postcondition for function application, supporting a useful specification style.

**Definition 26.** The *pure behavioural logic* F is defined by replacing rule (2) in Fig. 2 with the alternative:

$$\frac{\phi \in V\mathcal{F}(\rho) \qquad \Psi \in CF(\tau)}{(\phi \mapsto \Psi) \in \, V\mathcal{F}(\rho \to \tau)} (2^\*)$$

The semantics is modified by defining V |= (φ → Ψ) using formula (2) of Sect. 3.

**Proposition 27.** *If the open extension of* ≡<sup>V</sup> *is compatible then the logics* V *and* F *are equi-expressive. Similarly, if the open extension of* V<sup>+</sup> *is compatible then the positive fragments* <sup>V</sup><sup>+</sup> *and* <sup>F</sup><sup>+</sup> *are equi-expressive.*

*Proof.* The definition of (φ → Ψ) within V, given in (1) of Sect. 3, can be used as the basis of an inductive translation from <sup>F</sup> to <sup>V</sup> (and from <sup>F</sup><sup>+</sup> to <sup>V</sup><sup>+</sup>).

For the reverse translation, whose correctness proof is more interesting, we give a little more detail. Every value/computation formula, φ/Φ, of V is inductively translated to a corresponding formula <sup>φ</sup>/Φ of <sup>F</sup>. The interesting case is:

$$(\widehat{V \mapsto \Phi}) \ := \ (\psi\_V \mapsto \widehat{\Phi}),$$

where ψ<sup>V</sup> is a formula such that: V |=<sup>F</sup> ψ<sup>V</sup> ; and, for any ψ, if V |=<sup>F</sup> ψ then ψ<sup>V</sup> → ψ (meaning that V |=<sup>F</sup> ψ<sup>V</sup> implies V |=<sup>F</sup> ψ, for all V ). Such a formula ψ<sup>V</sup> is easily constructed as a countable conjunction (cf. Lemma 17). One then proves, by induction on types, that the <sup>F</sup>-semantics of <sup>φ</sup> (resp. <sup>Φ</sup>) coincides with the <sup>V</sup>-semantics of <sup>φ</sup> (resp. <sup>Φ</sup>). In the case for (V-→ Φ), the induction hypothesis is used to establish that any V satisfying V |=<sup>F</sup> ψ<sup>V</sup> enjoys the property that V ≡<sup>V</sup> V . It then follows from the compatibility of ≡<sup>V</sup> that W V ≡<sup>V</sup> W V , for any W of appropriate type, whence W V ≡<sup>F</sup> W V . The rest of the proof can easily be erected around these observations. Combining the above proposition with Theorem 1 we obtain the following.

**Corollary 28.** *Suppose* O *is a decomposable family of Scott-open modalities. Then* ≡<sup>F</sup> *coincides with* ≡<sup>V</sup> *, and* F<sup>+</sup> *coincides with* V<sup>+</sup> *. Hence the open extensions of* ≡<sup>F</sup> *and* F<sup>+</sup> *are compatible.*

We do not know any proof of the compatibility of the ≡<sup>F</sup> and F<sup>+</sup> relations that does not go via the logic V. In particular, the compatibility property of the **fix** operator seems difficult to establish directly for ≡<sup>F</sup> and F<sup>+</sup> .

### **8 Discussion and Related Work**

The behavioural logics considered in this paper are designed for the purpose of clarifying the notion of 'behavioural property', and for defining behavioural equivalence. As infinitary propositional logics, they are not directly suited to practical applications such as specification and verification. Nevertheless, they serve as low-level logics into which more practical finitary logics can be translated. For this, the closure of the logics under infinitary propositional logic is important. For example, there are standard translations of quantifiers and least and greatest fixed points into infinitary propositional logic. Also, in the case of global store, Hoare triples translate into logical combinations of modal formulas.

Our approach, of basing logics for effects on behavioural modalities, may potentially inform the design of practical logics for specifying and reasoning about effects. For example, Pitts' *evaluation logic* was an early logic for general computational effects [18]. In the light of the general theory of modalities in the present paper, it seems natural to replace the built-in and ♦ modalities of evaluation logic, with effect-specific modalities, as in Sect. 3.

The *logic for algebraic effects*, of Plotkin and Pretnar [23], axiomatises effectful behaviour by means of an equational theory over the signature of effect operations, following the algebraic approach to effects advocated by Plotkin and Power [22]. Such equational axiomatisations are typically sound with respect to more than one notion of program equivalence. The logic of [23] can thus be used to soundly reason about program equivalence, but does not in itself determine a notion of program equivalence. Instead, our logic is specifically designed as a vehicle for defining program equivalence. In doing so, our modalities can be viewed as a chosen family of 'observations' that are compatible with the effects present in the language. It is the choice of modalities that determines the equational properties that the effect operations satisfy.

The logic of [23] itself makes use of modalities, called *operation modalities*, each associated with a single effect operations in Σ. It would be natural to replace these modalities, which are syntactic in nature, with behavioural modalities of the form we consider. Similarly, our behavioural modalities appear to offer a promising basis for developing a modality-based refinement-type system for algebraic effects. In general, an important advantage we see in the use of behavioural modalities is that our notion of *strong decomposability* appears related to the availability of compositional proof principles for modal properties. This is a promising avenue for future exploration.

A rather different approach to logics for effects has been proposed by Goncharov, Mossakowski and Schr¨oder [3,16]. They assume a semantic setting in which the programming language is rich enough to contain a *pure fragment* that itself acts as a program logic. This approach is very powerful for certain effects. For example, Hoare logic can be derived in the case of global store. However, it appears not as widely adaptable across the range of effects as our approach.

Our logics exhibit certain similarities in form with the endogenous logic developed in Abramsky's *domain theory in logical form* [2]. Our motivation and approach are, however, quite different. Whereas Abramsky shows the usefulness of an axiomatic approach to a finitary logic as a way of characterising denotational equality, the present paper shows that there is a similar utility in considering an infinitary logic from a semantic perspective (based on operational semantics) as a method of defining behavioural equivalence.

The work in this paper has been carried out for fine-grained call-by-value [13], which is equivalent to call-by-value. The definitions can, however, be adapted to work for call-by-name, and even call-by-push-value [11]. Adding type constructors such as sum and product is also straightforward. We have not checked the generalisation to arbitrary recursive types, but we do not foresee any problem.

An omission from the present paper is that we have not said anything about *contextual equivalence*, which is often taken to be the default equivalence for applicative languages. In addition to determining the logically defined preorders/equivalences, the choice of the set O of modalities gives rise to a natural definition of *contextual preorder*, namely the largest compatible preorder that, on computations of unit type **1**, is contained in the relation from Sect. 4. The compatibility of V<sup>+</sup> established in the present paper means that we have the expected relation inclusions ≡<sup>V</sup> ⊆ V<sup>+</sup> ⊆ ctxt. It is an interesting question whether the logic can be restricted to characterise contextual equivalence/preorder. A more comprehensive investigation of contextual equivalence is being undertaken, in ongoing work, by Aliame Lopez and the first author.

The crucial notion of modality, in the present paper, was adapted from the notion of *observation* in [8]. The change from a set of trees of type **N** (an observation) to a set of unit-type trees (a modality) allows value formulas to be lifted to computation formulas, analogously to *predicate lifting* in coalgebra [7], which is a key characteristic of our modalities. Properties of *Scott-openness* and *decomposability* play a similar role the present paper to the role they play in [8]. However, the notion of decomposability for modalities (Definition 11) is more subtle than the corresponding notion for observations in [8].

There are certain limitations to the theory of modalities in the present paper. For example, for the combination of probability and nondeterminism, one might naturally consider modalities ♦P<sup>r</sup> and P<sup>r</sup> asserting the possibility and necessity of the termination probability exceeding r. However, the decomposability property fails. It appears that this situation can be rescued by changing to a quantitative logic, with a corresponding notion of quantitative modality. This is a topic of ongoing research.

**Acknowledgements.** We thank Francesco Gavazzo, Aliaume Lopez and the anonymous referees for helpful discussions and comments.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Explicit Effect Subtyping**

Amr Hany Saleh1(B), Georgios Karachalias<sup>1</sup>, Matija Pretnar<sup>2</sup>, and Tom Schrijvers<sup>1</sup>

<sup>1</sup> Department of Computer Science, KU Leuven, Leuven, Belgium ah.saleh@cs.kuleuven.be

<sup>2</sup> Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia

**Abstract.** As popularity of algebraic effects and handlers increases, so does a demand for their efficient execution. Eff, an ML-like language with native support for handlers, has a subtyping-based effect system on which an effect-aware optimizing compiler could be built. Unfortunately, in our experience, implementing optimizations for Eff is overly error-prone because its core language is implicitly-typed, making code transformations very fragile.

To remedy this, we present an explicitly-typed polymorphic core calculus for algebraic effect handlers with a subtyping-based type-and-effect system. It reifies appeals to subtyping in explicit casts with coercions that witness the subtyping proof, quickly exposing typing bugs in program transformations.

Our typing-directed elaboration comes with a constraint-based inference algorithm that turns an implicitly-typed Eff-like language into our calculus. Moreover, all coercions and effect information can be erased in a straightforward way, demonstrating that coercions have no computational content.

### **1 Introduction**

Algebraic effect handlers [17,18] are quickly maturing from a theoretical model to a practical language feature for user-defined computational effects. Yet, in practice they still incur a significant performance overhead compared to native effects.

Our earlier efforts [22] to narrow this gap with an optimising compiler from Eff [2] to OCaml showed promising results, in some cases reaching even the performance of hand-tuned code, but were very fragile and have been postponed until a more robust solution is found. We believe the main reason behind this fragility is the complexity of subtyping in combination with the implicit typing of Eff's core language, further aggravated by the "garbage collection" of subtyping constraints (see Sect. 7).<sup>1</sup>

<sup>1</sup> For other issues stemming from the same combination see issues #11 and #16 at https://github.com/matijapretnar/eff/issues/.

c The Author(s) 2018

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 327–354, 2018. https://doi.org/10.1007/978-3-319-89884-1\_12

For efficient compilation, one must avoid the poisoning problem [26], where unification forces a pure computation to take the less precise impure type of the context (e.g. a pure and an impure branch of a conditional both receive the same impure type). Since this rules out existing (and likely simpler) effect systems for handlers based on row-polymorphism [8,12,14], we propose a polymorphic explicitly-typed calculus based on subtyping. More specifically, our contributions are as follows:


The full version of this paper includes an appendix with omitted figures and can be found at http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW711.abs. html.

### **2 Overview**

This section presents an informal overview of the ExEff calculus, and the main issues with elaborating to and erasing from it.

<sup>2</sup> https://github.com/matijapretnar/eff/tree/explicit-effect-subtyping.

<sup>3</sup> https://github.com/matijapretnar/proofs/tree/master/explicit-effect-subtyping.

### **2.1 Algebraic Effect Handlers**

The main premise of algebraic effects is that impure behaviour arises from a set of *operations* such as Get and Set for mutable store, Read and Print for interactive input and output, or Raise for exceptions [17]. This allows generalizing exception handlers to other effects, to express backtracking, co-operative multithreading and other examples in a natural way [2,18].

Assume operations Tick : Unit <sup>→</sup> Unit and Tock : Unit <sup>→</sup> Unit that take a unit value as a parameter and yield a unit value as a result. Unlike special built-in operations, these operations have no intrinsic effectful behaviour, though we can give one through handlers. For example, the handler {Tick x k → (Print "tick"; k unit), Tock x k → Print "tock"} replaces all calls of Tick by printing out "tick" and similarly for Tock. But there is one significant difference between the two cases. Unlike exceptions, which always abort the evaluation, operations have a continuation waiting for their result. It is this continuation that the handler captures in the variable k and potentially uses in the handling clause. In the clause for Tick, the continuation is resumed by passing it the expected unit value, whereas in the clause for Tock, the operation is discarded. Thus, if we handle a computation emitting the two operations, it will print out "tick" until a first "tock" is printed, after which the evaluation stops.

### **2.2 Elaborating Subtyping**

Consider the computation do x <sup>←</sup> Tick unit; f x and assume that f has the function type Unit <sup>→</sup> Unit ! {Tock}, taking unit values to unit values and perhaps calling Tock operations in the process. The whole computation then has the type Unit ! {Tick, Tock} as it returns the unit value and may call Tick and Tock.

The above typing implicitly appeals to subtyping in several places. For instance, Tick unit has type Unit ! {Tick} and f x type Unit ! {Tock}. Yet, because they are sequenced with do, the type system expects they have the same set of effects. The discrepancies are implicitly reconciled by the subtyping which admits both {Tick} - {Tick, Tock} and {Tock} -{Tick, Tock}.

We elaborate the ImpEff term into the explicitly-typed core language ExEff to make those appeals to subtyping explicit by means of casts with coercions:

$$\mathbf{do}\ x \leftarrow ((\mathbf{Tick\ unit}) \rhd \gamma\_1); (f\ x) \rhd \gamma\_2$$

A coercion γ is a witness for a subtyping A ! Δ - A ! Δ and can be used to cast a term c of type A ! Δ to a term c γ of type A ! <sup>Δ</sup> . In the above term, <sup>γ</sup><sup>1</sup> and <sup>γ</sup><sup>2</sup> respectively witness Unit ! {Tick} - Unit ! {Tick, Tock} and Unit ! {Tock} -Unit ! {Tick, Tock}.

### **2.3 Polymorphic Subtyping for Types and Effects**

The above basic example only features monomorphic types and effects. Yet, our calculus also supports polymorphism, which makes it considerably more expressive. For instance the type of *<sup>f</sup>* in let *<sup>f</sup>* = (fun g → g unit) in ... is generalised to:

$$
\forall \alpha, \alpha'. \forall \delta, \delta'. \alpha \leqslant \alpha' \Rightarrow \delta \leqslant \delta' \Rightarrow (\mathsf{Unit} \to \alpha \mathrel{!} \ \delta) \to \alpha' \mathrel{!} \ \delta'
$$

This polymorphic type scheme follows the qualified types convention [9] where the type (Unit <sup>→</sup> α ! δ) <sup>→</sup> α ! <sup>δ</sup> is subjected to several qualifiers, in this case α α and δ δ . The universal quantifiers on the outside bind the type variables α and α , and the effect set variables δ and δ .

The elaboration of f into ExEff introduces explicit binders for both the quantifiers and the qualifiers, as well as the explicit casts where subtyping is used.

$$(A\alpha.\Lambda\alpha'.\Lambda\delta.A\delta'.\Lambda(\omega:\alpha\leqslant\alpha').\Lambda(\omega':\delta\leqslant\delta').\mathsf{Fun}\ (g:\mathsf{Un\ }\mathsf{t}\to\alpha\!\!\ast\delta)\mapsto(g\,\mathsf{un\ }\mathsf{t}\,)\rhd\mathsf{(\omega\ 1\leqslant\mathsf{t}\,)\rightarrow}$$

Here the binders for qualifiers introduce coercion variables ω between pure types and ω between operation sets, which are then combined into a computation coercion ω ! ω and used for casting the function application g unit to the expected type.

Suppose that h has type Unit <sup>→</sup> Unit ! {Tick} and f h type Unit ! {Tick, Tock}. In the ExEff calculus the corresponding instantiation of f is made explicit through type and coercion applications

$$f \text{ Unit } \mathbf{Unit} \left\{ \mathbf{Tick} \right\} \left\{ \mathbf{Tick}, \mathbf{Tock} \right\} \gamma\_1 \gamma\_2 \, h.$$

where <sup>γ</sup><sup>1</sup> needs to be a witness for Unit - Unit and <sup>γ</sup><sup>2</sup> for {Tick} - {Tick, Tock}.

### **2.4 Guaranteed Erasure with Skeletons**

One of our main requirements for ExEff is that its effect information and subtyping can be easily erased. The reason is twofold. Firstly, we want to show that neither plays a role in the runtime behaviour of ExEff programs. Secondly and more importantly, we want to use a conventionally typed (System F-like) functional language as a backend for the Eff compiler.

At first, erasure of both effect information and subtyping seems easy: simply drop that information from types and terms. But by dropping the effect variables and subtyping constraints from the type of *<sup>f</sup>* , we get <sup>∀</sup>α, α .(Unit <sup>→</sup> α) <sup>→</sup> α instead of the expected type <sup>∀</sup>α.(Unit <sup>→</sup> α) <sup>→</sup> α. In our naive erasure attempt we have carelessly discarded the connection between α and α . A more appropriate approach to erasure would be to unify the types in dropped subtyping constraints. However, unifying types may reduce the number of type variables when they become instantiated, so corresponding binders need to be dropped, greatly complicating the erasure procedure and its meta-theory.

Fortunately, there is an easier way by tagging all bound type variables with *skeletons*, which are barebone types without effect information. For example, the skeleton of a function type <sup>A</sup> <sup>→</sup> <sup>B</sup> ! <sup>Δ</sup> is <sup>τ</sup><sup>1</sup> <sup>→</sup> <sup>τ</sup><sup>2</sup>, where <sup>τ</sup><sup>1</sup> is the skeleton of <sup>A</sup> and <sup>τ</sup><sup>2</sup> the skeleton of <sup>B</sup>. In ExEff every well-formed type has an associated skeleton, and any two types <sup>A</sup><sup>1</sup> - <sup>A</sup><sup>2</sup> share the same skeleton. In particular, binders for type variables are explicitly annotated with skeleton variables ς. For instance, the actual type of *f* is:

$$\forall \varsigma. \forall (\alpha : \varsigma), (\alpha' : \varsigma). \forall \delta, \delta'. \alpha \leqslant \alpha' \Rightarrow \delta \leqslant \delta' \Rightarrow (\mathsf{Unit} \to \alpha \mathrel{!} \ \delta) \to \alpha' \mathrel{!} \ \delta'$$

The skeleton quantifications and annotations also appear at the term-level:

$$A \ltimes A(\alpha : \varsigma) . A(\alpha' : \varsigma) . A \delta . A \delta' . A(\omega : \alpha \lessapprox \alpha') . A(\omega' : \delta \lessapprox \delta') . \ldots .$$

Now erasure is really easy: we drop not only effect and subtyping-related term formers, but also type binders and application. We do retain skeleton binders and applications, which take over the role of (plain) types in the backend language. In terms, we replace types by their skeletons. For instance, for *f* we get:

```
Λς.fun (g : Unit → ς) → g unit : ∀ς.(Unit → ς) → ς
```
### **3 The ImpEff Language**

This section presents ImpEff, a basic functional calculus with support for algebraic effect handlers, which forms the core language of our optimising compiler. We describe the relevant concepts, but refer the reader to Pretnar's tutorial [21], which explains essentially the same calculus in more detail.

### **3.1 Syntax**

Figure 1 presents the syntax of the source language. There are two main kinds of terms: (pure) values v and (dirty) computations c, which may call effectful operations. Handlers h are a subsidiary sort of values. We assume a given set of *operations* Op, such as Get and Put. We abbreviate Op<sup>1</sup> x k → <sup>c</sup>Op<sup>1</sup> ,..., Op<sup>n</sup> x k → <sup>c</sup>Op<sup>n</sup> as [Op x k → <sup>c</sup>Op]Op∈O, and write <sup>O</sup> to denote the set {Op<sup>1</sup>,..., Opn}.

Similarly, we distinguish between two basic sorts of types: the value types A, B and the computation types C, D. There are four forms of value types: type variables α, function types A <sup>→</sup> C, handler types <sup>C</sup> <sup>D</sup> and the Unit type. Skeletons τ capture the shape of types, so, by design, their forms are identical. The computation type A ! Δ is assigned to a computation returning values of type A and potentially calling operations from the *dirt* set Δ. A dirt set contains zero or more operations Op and is terminated either by an empty set or a dirt variable δ. Though we use cons-list syntax, the intended semantics of dirt sets Δ is that the order of operations Op is irrelevant. Similarly to all HM-based systems, we discriminate between value types (or monotypes) A, qualified types *<sup>K</sup>* and polytypes (or type schemes) *<sup>S</sup>*. (Simple) subtyping constraints π denote inequalities between either value types or dirts. We also present the more general form of constraints ρ that includes inequalities between computation types (as we illustrate in Sect. 3.2 below, this allows for a single, uniform constraint entailment relation). Finally, polytypes consist of zero or more skeleton, type or dirt abstractions followed by a qualified type.

### **3.2 Typing**

Figure 2 presents the typing rules for values and computations, along with a typing-directed elaboration into our target language ExEff. In order to simplify the presentation, in this section we focus exclusively on typing. The parts of the rules that concern elaboration are highlighted in gray and are discussed in Sect. 5.

**Values.** Typing for values takes the form <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>A</sup>* <sup>v</sup> , and, given a typing environment Γ, checks a value v against a value type A.

Rule TmVar handles term variables. Given that x has type (∀ς.α : τ.∀δ.π <sup>⇒</sup> *<sup>A</sup>*), we *appropriately* instantiate the skeleton (ς), type (α), and dirt (δ) variables, and ensure that the instantiated wanted constraints <sup>σ</sup>(π) are satisfied, via side condition <sup>Γ</sup> co <sup>γ</sup> : <sup>σ</sup>(π). Rule TmCastV allows casting the type of a value <sup>v</sup> from *<sup>A</sup>* to *<sup>B</sup>*, if *<sup>A</sup>* is a subtype of *<sup>B</sup>* (upcasting). As illustrated by Rule TmTmAbs, we omit freshness conditions by adopting the Barendregt convention [1]. Finally, Rule TmHand gives typing for handlers. It requires that the right-hand sides of the return clause and all operation clauses have the same computation type (*<sup>B</sup>* ! Δ), and that all operations mentioned are part of the top-level signature Σ. <sup>4</sup> The result type takes the form *<sup>A</sup>* ! Δ ∪ O *<sup>B</sup>* ! Δ, capturing the intended handler semantics: given a computation of type *<sup>A</sup>* ! Δ ∪ O, the handler (a) produces a result of type *B*, (b) handles operations O, and (c) propagates unhandled operations Δ to the output.

<sup>4</sup> We capture all defined operations along with their types in a global signature Σ.

**Fig. 2.** ImpEff Typing & Elaboration

**Computations.** Typing for computations takes the form <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup>* <sup>c</sup> , and, given a typing environment Γ, checks a computation c against a type C.

Rule TmCastC behaves like Rule TmCastV, but for computation types. Rule TmLet handles polymorphic, non-recursive let-bindings. Rule TmReturn handles return v computations. Keyword return effectively lifts a value v of type *<sup>A</sup>* into a computation of type *<sup>A</sup>* ! <sup>∅</sup>. Rule TmOp checks operation calls. First, we ensure that v has the appropriate type, as specified by the signature of Op. Then, the continuation (y.c) is checked. The side condition Op <sup>∈</sup> Δ ensures that the called operation Op is captured in the result type. Rule TmDo handles sequencing. Given that <sup>c</sup><sup>1</sup> has type *<sup>A</sup>* ! <sup>Δ</sup>, the pure part of the result of type *<sup>A</sup>* is bound to term variable x, which is brought in scope for checking c<sup>2</sup>. As we mentioned in Sect. 2, all computations in a do-construct should have the same effect set, Δ. Rule TmHandle eliminates handler types, just as Rule TmTmApp eliminates arrow types.

**Constraint Entailment.** The specification of constraint entailment takes the form <sup>Γ</sup> co <sup>γ</sup> : <sup>ρ</sup> and is presented in Fig. 3. Notice that we use <sup>ρ</sup> instead of <sup>π</sup>, which allows us to capture subtyping between two value types, computation types or dirts, within the same relation. Subtyping can be established in several ways:

Rule CoVar handles given assumptions. Rules VCoRefl and DCoRefl express that subtyping is reflexive, for both value types and dirts. Notice that we do not have a rule for the reflexivity of computation types since, as we illustrate below, it can be established using the reflexivity of their subparts. Rules VCoTrans, CCoTrans and DCoTrans express the transitivity of subtyping for value types, computation types and dirts, respectively. Rule VCoArr establishes inequality of arrow types. As usual, the arrow type constructor is contravariant in the argument type. Rules VCoArrL and CCoArrR are the inversions of Rule VCoArr, allowing us to establish the relation between the subparts of the arrow types. Rules VCoHand, CCoHL, and CCoHR work similarly, for handler types. Rule CCoComp captures the covariance of type constructor (!), establishing subtyping between two computation types if subtyping is established for their respective subparts. Rules VCoPure and DCoImpure are its inversions. Finally, Rules DCoNil and DCoOp establish subtyping between dirts. Rule DCoNil captures that the empty dirty set <sup>∅</sup> is a subdirt of any dirt Δ and Rule DCoOp expresses that dirt subtyping preserved under extension with the same operation Op.

**Well-Formedness of Types, Constraints, Dirts, and Skeletons.** The relations <sup>Γ</sup> vty *<sup>A</sup>* : <sup>τ</sup> *<sup>T</sup>* and <sup>Γ</sup> cty *<sup>C</sup>* : <sup>τ</sup> *<sup>C</sup>* check the well-formedness of value and computation types respectively. Similarly, relations <sup>Γ</sup> ct <sup>ρ</sup> <sup>ρ</sup> and <sup>Γ</sup> <sup>Δ</sup> <sup>Δ</sup> check the well-formedness of constraints and dirts, respectively.

**Fig. 3.** ImpEff Constraint Entailment

### **4 The ExEff Language**

#### **4.1 Syntax**

Figure <sup>4</sup> presents ExEff's syntax. ExEff is an intensional type theory akin to System F [7], where every term encodes its own typing derivation. In essence, all abstractions and applications that are implicit in ImpEff, are made explicit in ExEff via new syntactic forms. Additionally, ExEff is impredicative, which is reflected in the lack of discrimination between value types, qualified types and

type schemes; all non-computation types are denoted by *T*. While the impredicativity is not strictly required for the purpose at hand, it makes for a cleaner system.

**Coercions.** Of particular interest is the use of explicit *subtyping coercions*, denoted by γ. ExEff uses these to replace the implicit casts of ImpEff (Rules TmCastV and TmCastC in Fig. 2) with explicit casts (v γ) and (c γ).

Essentially, coercions γ are explicit witnesses of subtyping derivations: each coercion form corresponds to a subtyping rule. Subtyping forms a partial order, which is reflected in coercion forms <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup>, *<sup>T</sup>* , and <sup>Δ</sup> . Coercion form <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> captures transitivity, while forms *<sup>T</sup>* and <sup>Δ</sup> capture reflexivity for value types and dirts (reflexivity for computation types can be derived from these).

Subtyping for skeleton abstraction, type abstraction, dirt abstraction, and qualification is witnessed by forms <sup>∀</sup>ς.γ, <sup>∀</sup>α.γ, <sup>∀</sup>δ.γ, and π <sup>⇒</sup> γ, respectively. Similarly, forms <sup>γ</sup>[<sup>τ</sup> ], <sup>γ</sup>[*T*], <sup>γ</sup>[Δ], and <sup>γ</sup><sup>1</sup>@γ<sup>2</sup> witness subtyping of skeleton instantiation, type instantiation, dirt instantiation, and coercion application, respectively.

Syntactic forms <sup>γ</sup><sup>1</sup> <sup>→</sup> <sup>γ</sup><sup>2</sup> and <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> capture injection for the arrow and the handler type constructor, respectively. Similarly, inversion forms *left*(γ) and *right*(γ) capture projection, following from the injectivity of both type constructors.

Coercion form <sup>γ</sup><sup>1</sup> ! <sup>γ</sup><sup>2</sup> witnesses subtyping for computation types, using proofs for their components. Inversely, syntactic forms *pure*(γ) and *impure*(γ) witness subtyping between the value- and dirt-components of a computation coercion.

Finally, coercion forms <sup>∅</sup><sup>Δ</sup> and {Op} ∪ <sup>γ</sup> are concerned with dirt subtyping. Form <sup>∅</sup><sup>Δ</sup> witnesses that the empty dirt <sup>∅</sup> is a subdirt of any dirt <sup>Δ</sup>. Lastly, coercion form {Op} ∪ γ witnesses that subtyping between dirts is preserved under extension with a new operation. Note that we do not have an inversion form to extract a witness for <sup>Δ</sup><sup>1</sup> - <sup>Δ</sup><sup>2</sup> from a coercion for {Op} ∪ <sup>Δ</sup><sup>1</sup> - {Op} ∪ Δ<sup>2</sup>. The reason is that dirt sets are sets and not inductive structures. For instance, for <sup>Δ</sup><sup>1</sup> <sup>=</sup> {Op} and <sup>Δ</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup> the latter subtyping holds, but the former does not.

### **4.2 Typing**

**Value and Computation Typing.** Typing for ExEff values and computations is presented in Figs. 5 and 6 and is given by two mutually recursive relations of the form <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>T</sup>* (values) and <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup>* (computations). ExEff typing environments Γ contain bindings for variables of all sorts:

$$\Gamma ::= \epsilon \mid \Gamma, \varsigma \mid \Gamma, \alpha: \tau \mid \Gamma, \delta \mid \Gamma, x: T \mid \Gamma, \omega: \pi$$

Typing is entirely syntax-directed. Apart from the typing rules for skeleton, type, dirt, and coercion abstraction (and, subsequently, skeleton, type, dirt, and coercion application), the main difference between typing for ImpEff and ExEff lies in the explicit cast forms, (v γ) and (c γ). Given that a value v has type *<sup>T</sup>*<sup>1</sup> and that <sup>γ</sup> is a proof that *<sup>T</sup>*<sup>1</sup> is a subtype of *<sup>T</sup>*2, we can upcast <sup>v</sup> with an explicit cast operation (v γ). Upcasting for computations works analogously.

**Well-Formedness of Types, Constraints, Dirts and Skeletons.** The definitions of the judgements that check the well-formedness of ExEff value types (<sup>Γ</sup> *<sup>T</sup> <sup>T</sup>* : <sup>τ</sup> ), computation types (<sup>Γ</sup> *<sup>C</sup> <sup>C</sup>* : <sup>τ</sup> ), dirts (<sup>Γ</sup> <sup>Δ</sup> <sup>Δ</sup>), and skeletons (<sup>Γ</sup> <sup>τ</sup> <sup>τ</sup> ) are equally straightforward as those for ImpEff.

**Coercion Typing.** Coercion typing formalizes the intuitive interpretation of coercions we gave in Sect. 4.1 and takes the form <sup>Γ</sup> co <sup>γ</sup> : <sup>ρ</sup>. It is essentially an extension of the constraint entailment relation of Fig. 3.

### **4.3 Operational Semantics**

Figure <sup>7</sup> presents selected rules of ExEff's small-step, call-by-value operational semantics. For lack of space, we omit β-rules and other common rules and focus only on cases of interest.

Firstly, one of the non-conventional features of our system lies in the stratification of results in plain results and cast results:


terminal value v<sup>T</sup> ::= unit <sup>|</sup> h <sup>|</sup> fun x : *<sup>T</sup>* → c <sup>|</sup> Λα : τ.v <sup>|</sup> Λδ.v <sup>|</sup> λω : π.v value result v<sup>R</sup> ::= <sup>v</sup><sup>T</sup> <sup>|</sup> <sup>v</sup><sup>T</sup> <sup>γ</sup> computation result c<sup>R</sup> ::= return <sup>v</sup><sup>T</sup> <sup>|</sup> (return <sup>v</sup><sup>T</sup> ) <sup>γ</sup> <sup>|</sup> Op <sup>v</sup><sup>R</sup> (<sup>y</sup> : *<sup>T</sup>*.c)

Terminal values v<sup>T</sup> represent conventional values, and value results v<sup>R</sup> can either be plain terminal values v<sup>T</sup> or terminal values with a cast: <sup>v</sup><sup>T</sup> <sup>γ</sup>. The same applies to computation results c<sup>R</sup>. 5

Although unusual, this stratification can also be found in Crary's coercion calculus for inclusive subtyping [4], and, more recently, in System F<sup>C</sup> [25]. Stratification is crucial for ensuring type preservation. Consider for example the expression

<sup>5</sup> Observe that operation values do not feature an outermost cast operation, as the coercion can always be pushed into its continuation.

(return <sup>5</sup> int ! <sup>∅</sup>{Op}), of type int ! {Op}. We can not reduce the expression further without losing effect information; removing the cast would result in computation (return 5), of type int ! <sup>∅</sup>. Even if we consider type preservation only up to subtyping, the redex may still occur as a subterm in a context that expects solely the larger type.

Secondly, we need to make sure that casts do not stand in the way of evaluation. This is captured in the so-called "push" rules, all of which appear in Fig. 7.

In relation <sup>v</sup> <sup>v</sup> <sup>v</sup> , the first rule groups nested casts into a single cast, by means of transitivity. The next three rules capture the essence of push rules: whenever a redex is "blocked" due to a cast, we take the coercion apart and redistribute it (in a type-preserving manner) over the subterms, so that evaluation can progress.

The situation in relation <sup>c</sup> <sup>c</sup> <sup>c</sup> is quite similar. The first rule uses transitivity to group nested casts into a single cast. The second rule is a push rule for β-reduction. The third rule pushes a cast out of a return-computation. The fourth rule pushes a coercion inside an operation-computation, illustrating why the syntax for c<sup>R</sup> does not require casts on operation-computations. The fifth rule is a push rule for sequencing computations and performs two tasks at once. Since we know that the computation bound to x calls no operations, we (a) safely "drop" the impure part of γ, and (b) substitute x with v<sup>T</sup> , cast with the pure part of γ (so that types are preserved). The sixth rule handles operation calls in sequencing computations. If an operation is called in a sequencing computation, evaluation is suspended and the rest of the computation is captured in the continuation.

The last four rules are concerned with effect handling. The first of them pushes a coercion on the handler "outwards", such that the handler can be exposed and evaluation is not stuck (similarly to the push rule for term application). The second rule behaves similarly to the push/beta rule for sequencing computations. Finally, the last two rules are concerned with handling of operations. The first of the two captures cases where the called operation is handled by the handler, in which case the respective clause of the handler is called. As illustrated by the rule, like Pretnar [20], ExEff features *deep handlers*: the continuation is also wrapped within a with-handle construct. The last rule captures cases where the operation is not covered by the handler and thus remains unhandled.

We have shown that ExEff is type safe:

### **Theorem 1 (Type Safety)**

*– If* <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *T then either* <sup>v</sup> *is a result value or* <sup>v</sup> <sup>v</sup> <sup>v</sup> *and* <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *T .*

*– If* <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup> then either* <sup>c</sup> *is a result computation or* <sup>c</sup> <sup>c</sup> <sup>c</sup> *and* <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup> .*

**Fig. 7.** ExEff Operational Semantics (Selected Rules)

### **5 Type Inference and Elaboration**

This section presents the typing-directed elaboration of ImpEff into ExEff. This elaboration makes all the implicit type and effect information explicit, and introduces explicit term-level coercions to witness the use of subtyping.

After covering the declarative specification of this elaboration, we present a constraint-based algorithm to infer ImpEff types and at the same time elaborate into ExEff. This algorithm alternates between two phases: (1) the syntaxdirected generation of constraints from the ImpEff term, and (2) solving these constraints.

#### **5.1 Elaboration of ImpEff into ExEff**

The grayed parts of Fig. <sup>2</sup> augment the typing rules for ImpEff value and computation terms with typing-directed elaboration to corresponding ExEff terms. The elaboration is mostly straightforward, mapping every ImpEff construct onto its corresponding ExEff construct while adding explicit type annotations to binders in Rules TmTmAbs, TmHandler and TmOp. Implicit appeals to subtyping are turned into explicit casts with coercions in Rules TmCastV and TmCastC. Rule TmLet introduces explicit binders for skeleton, type, and dirt variables, as well as for constraints. These last also introduce coercion variables ω that can be used in casts. The binders are eliminated in rule TmVar by means of explicit application with skeletons, types, dirts and coercions. The coercions are produced by the auxiliary judgement <sup>Γ</sup> co <sup>γ</sup> : <sup>π</sup>, defined in Fig. 3, which provides a coercion witness for every subtyping proof.

As a sanity check, we have shown that elaboration preserves types.

#### **Theorem 2 (Type Preservation)**

*– If* <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>A</sup>* <sup>v</sup> *then elab*<sup>Γ</sup> (Γ) <sup>v</sup> <sup>v</sup> : *elab<sup>S</sup>* (*A*)*. – If* <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup>* <sup>c</sup> *then elab*<sup>Γ</sup> (Γ) <sup>c</sup> <sup>c</sup> : *elab<sup>C</sup>* (*<sup>C</sup>* )*.*

Here *elab*<sup>Γ</sup> (Γ), *elab<sup>S</sup>* (*A*) and *elab<sup>C</sup>* (*<sup>C</sup>* ) convert ImpEff environments and types into ExEff environments and types.

#### **5.2 Constraint Generation and Elaboration**

Constraint generation with elaboration into ExEff is presented in Figs. <sup>8</sup> (values) and 9 (computations). Before going into the details of each, we first introduce the three auxiliary constructs they use.

$$\begin{array}{c} \text{constraint set } \mathcal{P}, \mathcal{Q} ::= \bullet \mid \tau\_1 = \tau\_2, \mathcal{P} \mid \alpha : \tau, \mathcal{P} \mid \overline{\omega : \pi}, \mathcal{P} \\ \text{typical environment } \mathcal{P} ::= \epsilon \mid \mathcal{P}, x : S \\ \text{substitution } \sigma ::= \bullet \mid \sigma \cdot [\tau/\varsigma] \mid \sigma \cdot [A/\alpha] \mid \sigma \cdot [\Delta/\delta] \mid \sigma \cdot [\overline{\gamma/\omega}] \end{array}$$

At the heart of our algorithm are sets P, containing three different kinds of constraints: (a) skeleton equalities of the form <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup>, (b) skeleton constraints of the form α : τ , and (c) wanted subtyping constraints of the form ω : π. The purpose of the first two becomes clear when we discuss constraint solving, in Sect. 5.3. Next, typing environments Γ only contain term variable bindings, while other variables represent unknowns of their sort and may end up being instantiated after constraint solving. Finally, during type inference we compute substitutions σ, for refining as of yet unknown skeletons, types, dirts, and coercions. The last one is essential, since our algorithm simultaneously performs type inference and elaboration into ExEff.

A substitution σ is a solution of the set <sup>P</sup>, written as σ <sup>|</sup><sup>=</sup> <sup>P</sup>, if we get derivable judgements after applying σ to all constraints in <sup>P</sup>.

**Values.** Constraint generation for values takes the form <sup>Q</sup>; <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>A</sup>* <sup>|</sup> Q ; σ v . It takes as inputs a set of wanted constraints <sup>Q</sup>, a typing environment Γ, and a ImpEff value v, and produces a value type *<sup>A</sup>*, a new set of wanted constraints Q , a substitution σ, and a ExEff value v .

Unlike standard HM, our inference algorithm does not keep constraint generation and solving separate. Instead, the two are interleaved, as indicated by


the additional arguments of our relation: (a) constraints Q are passed around in a stateful manner (i.e., they are input and output), and (b) substitutions σ generated from constraint solving constitute part of the relation output. We discuss the reason for this interleaved approach in Sect. 5.4; we now focus on the algorithm.

The rules are syntax-directed on the input ImpEff value. The first rule handles term variables x: as usual for constraint-based type inference the rule instantiates the polymorphic type (∀ς.¯ <sup>α</sup> : τ.∀¯δ.π¯ <sup>⇒</sup> *<sup>A</sup>*) of <sup>x</sup> with fresh variables; these are placeholders that are determined during constraint solving. Moreover, the rule extends the wanted constraints <sup>P</sup> with ¯π, appropriately instantiated. In ExEff, this corresponds to explicit skeleton, type, dirt, and coercion applications.

More interesting is the third rule, for term abstractions. Like in standard Hindley-Damas-Milner [5], it generates a fresh type variable α for the type of the abstracted term variable x. In addition, it generates a fresh skeleton variable ς, to capture the (yet unknown) shape of α.

As explained in detail in Sect. 5.3, the constraint solver instantiates type variables only through their skeletons annotations. Because we want to allow local constraint solving for the body c of the term abstraction the opportunity to produce a substitution σ that instantiates α, we have to pass in the annotation constraint α : ς. <sup>6</sup> We apply the resulting substitution σ to the result type σ(α) <sup>→</sup> *<sup>C</sup>* . 7

Finally, the fourth rule is concerned with handlers. Since it is the most complex of the rules, we discuss each of its premises separately:

Firstly, we infer a type *<sup>B</sup>*<sup>r</sup> ! <sup>Δ</sup><sup>r</sup> for the right hand side of the return-clause. Since <sup>α</sup><sup>r</sup> is a fresh unification variable, just like for term abstraction we require <sup>α</sup><sup>r</sup> : <sup>ς</sup>r, for a fresh skeleton variable <sup>ς</sup>r.

Secondly, we check every operation clause in O in order. For each clause, we generate fresh skeleton, type, and dirt variables (ς<sup>i</sup>, α<sup>i</sup>, and δ<sup>i</sup>), to account for the (yet unknown) result type <sup>α</sup><sup>i</sup> ! <sup>δ</sup><sup>i</sup> of the continuation <sup>k</sup>, while inferring type *<sup>B</sup>*Op<sup>i</sup> ! <sup>Δ</sup>Op<sup>i</sup> for the right-hand-side <sup>c</sup>Op<sup>i</sup> .

More interesting is the (final) set of wanted constraints Q . First, we assign to the handler the overall type

$$
\alpha\_{in} \, ! \, \delta\_{in} \Rightarrow \alpha\_{out} \, ! \, \delta\_{out}
$$

where <sup>ς</sup>in, αin, δin, ςout, αout, δout are fresh variables of the respective sorts. In turn, we require that (a) the type of the return clause is a subtype of <sup>α</sup>out ! <sup>δ</sup>out (given by the combination of <sup>ω</sup><sup>1</sup> and <sup>ω</sup><sup>2</sup>), (b) the right-hand-side type of each operation clause is a subtype of the overall result type: <sup>σ</sup><sup>n</sup>(*B*Op<sup>i</sup> ! <sup>Δ</sup>Op<sup>i</sup> ) - <sup>α</sup>out ! <sup>δ</sup>out (witnessed by <sup>ω</sup>3<sup>i</sup> ! <sup>ω</sup>4<sup>i</sup> ), (c) the actual types of the continuations *<sup>B</sup>*<sup>i</sup> <sup>→</sup> <sup>α</sup>out ! <sup>δ</sup>out in the operation clauses should be subtypes of their assumed types *<sup>B</sup>*<sup>i</sup> <sup>→</sup> <sup>σ</sup><sup>n</sup>(α<sup>i</sup> ! <sup>δ</sup><sup>i</sup>) (witnessed by <sup>ω</sup>5<sup>i</sup> ). (d) the overall argument type <sup>α</sup>in is a subtype of the assumed type of x: σ<sup>n</sup>(σ<sup>r</sup>(α<sup>r</sup>)) (witnessed by <sup>ω</sup><sup>6</sup>), and (e) the input dirt set <sup>δ</sup>in is a subtype of the resulting dirt set <sup>δ</sup>out, extended with the handled operations <sup>O</sup> (witnessed by ω<sup>7</sup>).

All the aforementioned implicit subtyping relations become explicit in the elaborated term cres, via explicit casts.

**Computations.** The judgement <sup>Q</sup>; <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup>* | Q ; σ c generates constraints for computations.

The first rule handles term applications of the form <sup>v</sup><sup>1</sup> <sup>v</sup><sup>2</sup>. After inferring a type for each subterm (*A*<sup>1</sup> for <sup>v</sup><sup>1</sup> and *<sup>A</sup>*<sup>2</sup> for <sup>v</sup><sup>2</sup>), we generate the wanted constraint <sup>σ</sup><sup>2</sup>(*A*1) - *<sup>A</sup>*<sup>2</sup> <sup>→</sup> <sup>α</sup> ! <sup>δ</sup>, with fresh type and dirt variables <sup>α</sup> and <sup>δ</sup>, respectively. Associated coercion variable ω is then used in the elaborated term to explicitly (up)cast v <sup>1</sup> to the expected type *<sup>A</sup>*<sup>2</sup> <sup>→</sup> <sup>α</sup> ! <sup>δ</sup>.

The third rule handles polymorphic let-bindings. First, we infer a type *A* for v, as well as wanted constraints <sup>Q</sup>v. Then, we simplify wanted constraints <sup>Q</sup><sup>v</sup> by means of function solve (which we explain in detail in Sect. 5.3 below), obtaining a substitution σ <sup>1</sup> and a set of *residual constraints* Q v.

<sup>6</sup> This hints at why we need to pass constraints in a stateful manner.

<sup>7</sup> Though σ refers to ImpEff types, we abuse notation to save clutter and apply it directly to ExEff entities too.

**Fig. 9.** Constraint Generation with Elaboration (Computations)

Generalization of x's type is performed by auxiliary function *split*, given by the following clause:

$$\begin{array}{c} \bar{\varsigma} = \{ \varsigma \mid (\alpha : \varsigma) \in \mathcal{Q}, \not\equiv \alpha'. \alpha' . \alpha' \notin \bar{\alpha} \land (\alpha' : \varsigma) \in \mathcal{Q} \} \\ \bar{\alpha} = f v\_{\alpha}(\mathcal{Q}) \cup f v\_{\alpha}(A) \mid f v\_{\alpha}(\varGamma) & \mathcal{Q}\_{1} = \{ (\omega : \pi) \mid (\omega : \pi) \in \mathcal{Q}, f v(\pi) \not\equiv f v(\varGamma) \} \\ \bar{\delta} = f v\_{\delta}(\mathcal{Q}) \cup f v\_{\delta}(A) \mid f v\_{\delta}(\varGamma) & \mathcal{Q}\_{2} = \mathcal{Q} - \mathcal{Q}\_{1} \\ \hline \\ \mathit{split} & \mathit{split}(\varGamma, \mathcal{Q}, A) = \langle \bar{\varsigma}, \overline{\alpha : \tau}, \bar{\delta}, \mathcal{Q}\_{1}, \mathcal{Q}\_{2} \rangle \end{array}$$

In essence, *split* generates the type (scheme) of x in parts. Additionally, it computes the subset Q<sup>2</sup> of the input constraints Q that do not depend on locallybound variables. Such constraints can be floated "upwards", and are passed as input when inferring a type for c. The remainder of the rule is self-explanatory.

The fourth rule handles operation calls. Observe that in the elaborated term, we upcast the inferred type to match the expected type in the signature.

The fifth rule handles sequences. The requirement that all computations in <sup>a</sup> do-construct have the same dirt set is expressed in the wanted constraints σ<sup>2</sup>(Δ<sup>1</sup>) <sup>δ</sup> and <sup>Δ</sup><sup>2</sup> δ (where δ is a fresh dirt variable; the resulting dirt set), witnessed by coercion variables <sup>ω</sup><sup>1</sup> and <sup>ω</sup><sup>2</sup>. Both coercion variables are used in the elaborated term to upcast <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>2</sup>, such that both draw effects from the same dirt set δ.

Finally, the sixth rule is concerned with effect handling. After inferring type *<sup>A</sup>*<sup>1</sup> for the handler <sup>v</sup>, we require that it takes the form of a handler type, witnessed by coercion variable <sup>ω</sup><sup>1</sup> : <sup>σ</sup>2(*A*1) - (α<sup>1</sup> ! <sup>δ</sup><sup>1</sup> <sup>α</sup><sup>2</sup> ! <sup>δ</sup>2), for fresh <sup>α</sup>1, α2, δ1, δ2. To ensure that the type *<sup>A</sup>*<sup>2</sup> ! <sup>Δ</sup><sup>2</sup> of <sup>c</sup> matches the expected type, we require that *<sup>A</sup>*<sup>2</sup> ! <sup>Δ</sup><sup>2</sup> <sup>α</sup><sup>1</sup> ! <sup>δ</sup><sup>1</sup>. Our syntax does not include coercion variables for computation subtyping; we achieve the same effect by combining <sup>ω</sup><sup>2</sup> : *<sup>A</sup>*<sup>2</sup> <sup>α</sup><sup>1</sup> and <sup>ω</sup><sup>3</sup> : <sup>Δ</sup><sup>2</sup> δ<sup>1</sup>.

**Theorem 3 (Soundness of Inference).** *If* •; <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>A</sup>* | Q; <sup>σ</sup> <sup>v</sup> *then for any* <sup>σ</sup> <sup>|</sup><sup>=</sup> <sup>Q</sup>*, we have* (σ · <sup>σ</sup>)(Γ) <sup>v</sup> <sup>v</sup> : <sup>σ</sup> (*A*) σ (v ) *, and analogously for computations.*

**Theorem 4 (Completeness of Inference).** *If* <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>A</sup>* <sup>v</sup> *then we have* •; <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *<sup>A</sup>* | Q; <sup>σ</sup> <sup>v</sup> *and there exists* <sup>σ</sup> <sup>|</sup><sup>=</sup> <sup>Q</sup> *and* <sup>γ</sup>*, such that* <sup>σ</sup> (v) = v *and* <sup>σ</sup>(Γ) co <sup>γ</sup> : <sup>σ</sup> (*A* ) -*A. An analogous statement holds for computations.*

### **5.3 Constraint Solving**

The second phase of our inference-and-elaboration algorithm is the constraint solver. It is defined by the solve function signature:

$$\boxed{\mathtt{so1ve}(\sigma; \mathcal{P}; \mathcal{Q}) = (\sigma', \mathcal{P}')}$$

It takes three inputs: the substitution σ accumulated so far, a list of already processed constraints P, and a queue of still to be processed constraints Q. There are two outputs: the substitution σ that solves the constraints and the residual constraints P . The substitutions σ and σ contain four kinds of mappings: ς → τ , α → *<sup>A</sup>*, δ → Δ and ω <sup>→</sup> γ which instantiate respectively skeleton variables, type variables, dirt variables and coercion variables.

**Theorem 5 (Correctness of Solving).** *For any set* <sup>Q</sup>*, the call* solve(•; •; <sup>Q</sup>) *either results in a failure, in which case* <sup>Q</sup> *has no solutions, or returns* (σ,P) *such that for any* <sup>σ</sup> <sup>|</sup><sup>=</sup> <sup>Q</sup>*, there exists* σ <sup>|</sup><sup>=</sup> <sup>P</sup> *such that* σ <sup>=</sup> σ · σ*.*

The solver is invoked with solve(•; •; <sup>Q</sup>), to process the constraints <sup>Q</sup> generated in the first phase of the algorithm, i.e., with an empty substitution and no processed constraints. The solve function is defined by case analysis on the queue.

**Empty Queue.** When the queue is empty, all constraints have been processed. What remains are the residual constraints and the solving substitution σ, which are both returned as the result of the solver.

solve(σ; <sup>P</sup>; •)=(σ, <sup>P</sup>)

**Skeleton Equalities.** The next set of cases we consider are those where the queue is non-empty and its first element is an equality between skeletons <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup>. We consider seven possible cases based on the structure of <sup>τ</sup><sup>1</sup> and <sup>τ</sup><sup>2</sup> that together essentially implement conventional unification as used in Hindley-Milner type inference [5].

solve(σ; <sup>P</sup>; <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup>, <sup>Q</sup>) = match <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup> with <sup>|</sup> ς <sup>=</sup> ς → solve(σ; <sup>P</sup>; <sup>Q</sup>) <sup>|</sup> <sup>ς</sup> <sup>=</sup> <sup>τ</sup> → if ς /<sup>∈</sup> *fv*<sup>ς</sup> (<sup>τ</sup> ) then let <sup>σ</sup>- = [τ /ς] in solve(σ- · σ; •; σ- (Q, <sup>P</sup>)) else fail <sup>|</sup> <sup>τ</sup> <sup>=</sup> <sup>ς</sup> → if ς /<sup>∈</sup> *fv*<sup>ς</sup> (<sup>τ</sup> ) then let <sup>σ</sup>- = [τ /ς] in solve(σ- · σ; •; σ- (Q, <sup>P</sup>)) else fail | Unit <sup>=</sup> Unit → solve(σ; <sup>P</sup>; <sup>Q</sup>) <sup>|</sup>(τ<sup>1</sup> <sup>→</sup> <sup>τ</sup><sup>2</sup>)=(τ<sup>3</sup> <sup>→</sup> <sup>τ</sup><sup>4</sup>) → solve(σ; <sup>P</sup>; <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>3</sup>, τ<sup>2</sup> <sup>=</sup> <sup>τ</sup><sup>4</sup>, <sup>Q</sup>) <sup>|</sup>(τ<sup>1</sup> <sup>τ</sup><sup>2</sup>)=(τ<sup>3</sup> <sup>τ</sup><sup>4</sup>) → solve(σ; <sup>P</sup>; <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>3</sup>, τ<sup>2</sup> <sup>=</sup> <sup>τ</sup><sup>4</sup>, <sup>Q</sup>) | otherwise → fail

The first case applies when both skeletons are the same type variable ς. Then the equality trivially holds. Hence we drop it and proceed with solving the remaining constraints. The next two cases apply when either <sup>τ</sup><sup>1</sup> or <sup>τ</sup><sup>2</sup> is a skeleton variable ς. If the occurs check fails, there is no finite solution and the algorithm signals failure. Otherwise, the constraint is solved by instantiating the ς. This additional substitution is accumulated and applied to all other constraints <sup>P</sup>, <sup>Q</sup>. Because the substitution might have modified some of the already processed constraints P, we have to revisit them. Hence, they are all pushed back onto the queue, which is processed recursively.

The next three cases consider three different ways in which the two skeletons can have the same instantiated top-level structure. In those cases the equality is decomposed into equalities on the subterms, which are pushed onto the queue and processed recursively.

The last catch-all case deals with all ways in which the two skeletons can be instantiated to different structures. Then there is no solution.

**Skeleton Annotations.** The next four cases consider a skeleton annotation α : τ at the head of the queue, and propagate the skeleton instantiation to the type variable. The first case, where the skeleton is a variable ς, has nothing to do, moves the annotation to the processed constraints and proceeds with the remainder of the queue. In the other three cases, the skeleton is instantiated and the solver instantiates the type variable with the corresponding structure, introducing fresh variables for any subterms. The instantiating substitution is accumulated and applied to the remaining constraints, which are processed recursively.

```
solve(σ; P; α : τ, Q) =
 match τ with
 | ς → solve(σ; P, α : τ; Q)
 | Unit → let σ-
                  = [Unit/α] in solve(σ-
                                          · σ; •; σ-

                                                    (Q, P))
 | τ1 → τ2 → let σ-
                      = [(ατ1
                            1 → ατ2
                                    2 ! δ)/α] in solve(σ-

                                                          ·σ; •; α1 : τ1, α2 : τ2, σ-

                                                                                    (Q, P))
 | τ1 -
       τ2 → let σ-
                      = [(ατ1
                            1 ! δ1 -
                                    ατ2
                                        2 ! δ2)/α] in solve(σ-

                                                               ·σ; •; α1 : τ1, α2 : τ2, σ-

                                                                                         (Q, P))
```
**Value Type Subtyping.** Next are the cases where a subtyping constraint between two value types *A*<sup>1</sup> - *<sup>A</sup>*2, with as evidence the coercion variable ω, is at the head of the queue. We consider six different situations.

solve(σ; <sup>P</sup>; <sup>ω</sup> : *<sup>A</sup>*<sup>1</sup> - *<sup>A</sup>*<sup>2</sup>, <sup>Q</sup>) = match *A*<sup>1</sup> - *A*<sup>2</sup> with | *A* - *<sup>A</sup>* → let *<sup>T</sup>* <sup>=</sup> *elab<sup>S</sup>* (*A*) in solve([*T*/ω] · <sup>σ</sup>; <sup>P</sup>; <sup>Q</sup>) <sup>|</sup> α<sup>τ</sup><sup>1</sup> - *<sup>A</sup>* → let <sup>τ</sup><sup>2</sup> <sup>=</sup> *skeleton*(*A*) in solve(σ; <sup>P</sup>, ω : <sup>α</sup><sup>τ</sup><sup>1</sup> - *<sup>A</sup>*; <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup>, <sup>Q</sup>) | *A* <sup>α</sup><sup>τ</sup><sup>1</sup> → let <sup>τ</sup><sup>2</sup> <sup>=</sup> *skeleton*(*A*) in solve(σ; <sup>P</sup>, ω : *<sup>A</sup>* <sup>α</sup><sup>τ</sup><sup>1</sup> ; <sup>τ</sup><sup>2</sup> <sup>=</sup> <sup>τ</sup><sup>1</sup>, <sup>Q</sup>) <sup>|</sup>(*A*<sup>1</sup> <sup>→</sup> *<sup>B</sup>*<sup>1</sup> ! <sup>Δ</sup><sup>1</sup>) - (*A*<sup>2</sup> <sup>→</sup> *<sup>B</sup>*<sup>2</sup> ! <sup>Δ</sup><sup>2</sup>) → let <sup>σ</sup>- = [(ω<sup>1</sup> <sup>→</sup> <sup>ω</sup><sup>2</sup> ! <sup>ω</sup><sup>3</sup>)/ω] in solve(σ- · <sup>σ</sup>; <sup>P</sup>; <sup>ω</sup><sup>1</sup> : *<sup>A</sup>*<sup>2</sup> - *<sup>A</sup>*<sup>1</sup>, ω<sup>2</sup> : *<sup>B</sup>*<sup>1</sup> - *<sup>B</sup>*<sup>2</sup>, ω<sup>3</sup> : <sup>Δ</sup><sup>1</sup> - Δ<sup>2</sup>, <sup>Q</sup>) <sup>|</sup>(*A*<sup>1</sup> ! <sup>Δ</sup><sup>1</sup> *<sup>A</sup>*<sup>2</sup> ! <sup>Δ</sup><sup>2</sup>) - (*A*<sup>3</sup> ! <sup>Δ</sup><sup>3</sup> *<sup>A</sup>*<sup>4</sup> ! <sup>Δ</sup><sup>4</sup>) → let <sup>σ</sup>- = [(ω<sup>1</sup> ! <sup>ω</sup><sup>2</sup> <sup>ω</sup><sup>3</sup> ! <sup>ω</sup><sup>4</sup>)/ω] in solve(σ- · <sup>σ</sup>; <sup>P</sup>; <sup>ω</sup><sup>1</sup> : *<sup>A</sup>*<sup>3</sup> - *<sup>A</sup>*<sup>1</sup>, ω<sup>2</sup> : <sup>Δ</sup><sup>3</sup> - <sup>Δ</sup><sup>1</sup>, ω<sup>3</sup> : *<sup>A</sup>*<sup>2</sup> - *<sup>A</sup>*<sup>4</sup>, ω<sup>4</sup> : <sup>Δ</sup><sup>2</sup> - Δ<sup>4</sup>, <sup>Q</sup>) | otherwise → fail

If the two types are equal, the subtyping holds trivially through reflexivity. The solver thus drops the constraint and instantiates ω with the reflexivity coercion *T* . Note that each coercion variable only appears in one constraint. So we only accumulate the substitution and do not have to apply it to the other constraints. In the next two cases, one of the two types is a type variable α. Then we move the constraint to the processed set. We also add an equality constraint between the skeletons<sup>8</sup> to the queue. This enforces the invariant that only types with the same skeleton are compared. Through the skeleton equality the type structure (if any) from the type is also transferred to the type variable. The next two cases concern two types with the same top-level instantiation. The solver then decomposes the constraint into constraints on the corresponding subterms and appropriately relates the evidence of the old constraint to the new ones. The final case catches all situations where the two types are instantiated with a different structure and thus there is no solution.

Auxiliary function *skeleton*(*A*) computes the skeleton of *A*.

**Dirt Subtyping.** The final six cases deal with subtyping constraints between dirts.

<sup>8</sup> We implicitly annotate every type variable with its skeleton: α<sup>τ</sup> .

```
solve(σ; P; ω : Δ -
                    Δ-

                        , Q) =
 match Δ -
            Δ-
                with
  | O ∪ δ -
           O-
               ∪ δ-
                    → if O = ∅ then let σ-
                                              = [((O\O-

                                                          ) ∪ δ-
                                                               -
                                                                )/δ-

                                                                    , O ∪ ω-

                                                                            /ω] in
                                       solve(σ-
                                                 · σ; •; (ω-
                                                            : δ ≤ σ-

                                                                    (Δ-

                                                                        )), σ-

                                                                             (Q, P))
                                  else solve(σ; P, (ω : Δ -
                                                             Δ-

                                                                 ); Q)
  | ∅ -
      Δ-
           → solve([∅Δ-
                         /ω] · σ; P; Q)
  | δ -
      ∅ → let σ-
                   = [∅/δ; ∅∅/ω] in solve(σ-
                                               · σ; •; σ-

                                                         (Q, P))
  | O ∪ δ -
           O-
               →
    if O⊆O-
                then let σ-
                            = [O ∪ ω-

                                       /ω] in solve(σ-
                                                       · σ; P, (ω-
                                                                   : δ -
                                                                       O-

                                                                           ); Q) else fail
  | O -
       O-
           → if O⊆O-
                          then let σ-
                                       = [O∪∅O-
                                                  \O/ω] in solve(σ-
                                                                      · σ; P; Q) else fail
  | O -
       O-
           ∪ δ-
                → let σ-
                          = [(O\O-

                                     ) ∪ δ-
                                          -
                                           /δ-

                                              ; O-
                                                   ∪ ∅(O-
                                                          \O)∪δ-
                                                                 -
                                                                 /ω] in
                    solve(σ-
                             · σ; •; σ-

                                       (Q, P))
```
If the two dirts are of the general form O ∪ δ and <sup>O</sup> <sup>∪</sup> δ , we distinguish two subcases. Firstly, if O is empty, there is nothing to be done and we move the constraint to the processed set. Secondly, if O is non-empty, we partially instantiate δ with any of the operations that appear in <sup>O</sup> but not in <sup>O</sup> . We then drop O from the constraint, and, after substitution, proceed with processing all constraints. For instance, for {Op<sup>1</sup>} ∪ <sup>δ</sup> - {Op<sup>2</sup>} ∪ δ , we instantiate δ to {Op<sup>1</sup>} ∪ δ—where δ is a fresh dirt variable—and proceed with the simplified constraint δ - {Op1, Op<sup>2</sup>} ∪ <sup>δ</sup>. Note that due to the set semantics of dirts, it is not valid to simplify the above constraint to δ - {Op<sup>2</sup>} ∪ <sup>δ</sup>. After all the substitution [δ → {Op<sup>1</sup>}, δ → ∅] solves the former and the original constraint, but not the latter.

The second case, ∅ - Δ , always holds and is discharged by instantiating ω to ∅<sup>Δ</sup>- . The third case, δ - <sup>∅</sup>, has only one solution: δ → ∅ with coercion <sup>∅</sup>∅. The fourth case, O ∪ δ - O , has as many solutions as there are subsets of O , provided that O⊆O . We then simplify the constraint to δ - O , which we move to the set of processed constraints. The fifth case, O - O , holds iff O⊆O . The last case, O - <sup>O</sup> <sup>∪</sup> <sup>δ</sup> , is like the first, but without a dirt variable in the left-hand side. We can satisfy it in a similar fashion, by partially instantiating δ with (O\O ) <sup>∪</sup> δ—where δ is a fresh dirt variable. Now the constraint is satisfied and can be discarded.

### **5.4 Discussion**

At first glance, the constraint generation algorithm of Sect. 5.2 might seem needlessly complex, due to eager constraint solving for let-generalization. Yet, we want to generalize at local let-bound values over both type and skeleton variables,<sup>9</sup> which means that we must solve all equations between skeletons before generalizing. In turn, since skeleton constraints are generated when solving subtyping constraints (Sect. 5.3), all skeleton annotations should be available during constraint solving. This can not be achieved unless the generated constraints are propagated statefully.

### **6 Erasure of Effect Information from ExEff**

### **6.1 The SkelEff Language**

The target of the erasure is SkelEff, which is essentially a copy of ExEff from which all effect information Δ, type information *<sup>T</sup>* and coercions γ have been removed. Instead, skeletons τ play the role of plain types. Thus, SkelEff is essentially System F extended with term-level (but not type-level) support for algebraic effects. Figure <sup>10</sup> defines the syntax of SkelEff. The type system and operational semantics of SkelEff follow from those of ExEff.

**Discussion.** The main point of SkelEff is to show that we can erase the effects and subtyping from ExEff to obtain types that are compatible with a System F-like language. At the term-level SkelEff also resembles a subset of Multicore OCaml [6], which provides native support for algebraic effects and handlers but features no explicit polymorphism. Moreover, SkelEff can also serve as a staging area for further elaboration into System F-like languages without support for algebraic effects and handlers (e.g., Haskell or regular OCaml). In those cases, computation terms can be compiled to one of the known encodings in the literature, such as a free monad representation [10,22], with delimited control [11], or using continuation-passing style [13], while values can typically be carried over as they are.

### **6.2 Erasure**

Figure <sup>11</sup> defines erasure functions <sup>σ</sup> <sup>v</sup> (v), <sup>σ</sup> <sup>c</sup> (c), <sup>σ</sup> <sup>V</sup>(*T*), <sup>σ</sup> <sup>C</sup>(*<sup>C</sup>* ) and <sup>σ</sup> <sup>E</sup>(Γ) for values, computations, value types, computation types, and type environments respectively. All five functions take a substitution σ from the free type variables α to their skeleton τ as an additional parameter.

Thanks to the skeleton-based design of ExEff, erasure is straightforward. All types are erased to their skeletons, dropping quantifiers for type variables and all occurrences of dirt sets. Moreover, coercions are dropped from values

<sup>9</sup> As will become apparent in Sect. 6, if we only generalize at the top over skeleton variables, the erasure does not yield local polymorphism.

**Fig. 11.** Definition of type erasure.

and computations. Finally, all binders and elimination forms for type variables, dirt set variables and coercions are dropped from values and type environments.

The expected theorems hold. Firstly, types are preserved by erasure.<sup>10</sup>

**Theorem 6 (Type Preservation).** *If* <sup>Γ</sup> <sup>v</sup> <sup>v</sup> : *T then* <sup>∅</sup> <sup>E</sup>(Γ) ev <sup>Γ</sup> <sup>v</sup> (v) : <sup>Γ</sup> <sup>V</sup>(*T*)*. If* <sup>Γ</sup> <sup>c</sup> <sup>c</sup> : *<sup>C</sup> then* <sup>∅</sup> <sup>E</sup>(Γ) ec <sup>Γ</sup> <sup>c</sup> (c) : <sup>Γ</sup> <sup>C</sup>(*C* )*.*

Here we abuse of notation and use Γ as a substitution from type variables to skeletons used by the erasure functions.

Finally, we have that erasure preserves the operational semantics.

**Theorem 7 (Semantic Preservation).** *If* <sup>v</sup> <sup>v</sup> <sup>v</sup> *then* <sup>σ</sup> <sup>v</sup> (v) <sup>≡</sup> <sup>v</sup> <sup>σ</sup> v (v )*. If* <sup>c</sup> <sup>c</sup> <sup>c</sup> *then* <sup>σ</sup> <sup>c</sup> (c) <sup>≡</sup> <sup>c</sup> <sup>σ</sup> c (c )*.*

In both cases, <sup>≡</sup> denotes the congruence closure of the step relation in Skel-Eff. The choice of substitution σ does not matter as types do not affect the behaviour.

<sup>10</sup> Typing for SkelEff values and computations take the form <sup>Γ</sup> ev <sup>v</sup> : <sup>τ</sup> and <sup>Γ</sup> ec <sup>c</sup> : <sup>τ</sup> .

**Discussion.** Typically, when type information is erased from call-by-value languages, type binders are erased by replacing them with other (dummy) binders. For instance, the expected definition of erasure would be:

$$
\epsilon\_\mathbf{v}^\sigma(A(\alpha:\tau).v) = \lambda(x:\mathsf{Unit}).\epsilon\_\mathbf{v}^\sigma(v).
$$

This replacement is motivated by a desire to preserve the behaviour of the typed terms. By dropping binders, values might be turned into computations that trigger their side-effects immediately, rather than at the later point where the original binder was eliminated. However, there is no call for this circumspect approach in our setting, as our grammatical partition of terms in values (without side-effects) and computations (with side-effects) guarantees that this problem cannot happen when we erase values to values and computations to computations.

### **7 Related Work and Conclusion**

**Eff 's Implicit Type System.** The most closely related work is that of Pretnar [20] on inferring algebraic effects for Eff, which is the basis for our implicitlytyped ImpEff calculus, its type system and the type inference algorithm. There are three major differences with Pretnar's inference algorithm.

Firstly, our work introduces an explicitly-typed calculus. For this reason we have extended the constraint generation phase with the elaboration into ExEff and the constraint solving phase with the construction of coercions.

Secondly, we add skeletons to guarantee erasure. Skeletons also allow us to use standard occurs-check during unification. In contrast, unification in Pretnar's algorithm is inspired by Simonet [24] and performs the occurs-check up to the equivalence closure of the subtyping relation. In order to maintain invariants, all variables in an equivalence class (also called a skeleton) must be instantiated simultaneously, whereas we can process one constraint at a time. As these classes turn out to be surrogates for the underlying skeleton types, we have decided to keep the name.

Finally, Pretnar incorporates garbage collection of constraints [19]. The aim of this approach is to obtain unique and simple type schemes by eliminating redundant constraints. Garbage collection is not suitable for our use as type variables and coercions witnessing subtyping constraints cannot simply be dropped, but must be instantiated in a suitable manner, which cannot be done in general.

Consider for instance a situation with type variables α<sup>1</sup>, <sup>α</sup><sup>2</sup>, <sup>α</sup><sup>3</sup>, <sup>α</sup><sup>4</sup>, and <sup>α</sup><sup>5</sup> where <sup>α</sup><sup>1</sup> <sup>α</sup><sup>3</sup>, <sup>α</sup><sup>2</sup> <sup>α</sup><sup>3</sup>, <sup>α</sup><sup>3</sup> <sup>α</sup><sup>4</sup>, and <sup>α</sup><sup>3</sup> <sup>α</sup><sup>5</sup>. Suppose that <sup>α</sup><sup>3</sup> does not appear in the type. Then garbage collection would eliminate it and replace the constraints by <sup>α</sup><sup>1</sup> <sup>α</sup><sup>4</sup>, <sup>α</sup><sup>2</sup> <sup>α</sup><sup>4</sup>, <sup>α</sup><sup>1</sup> <sup>α</sup><sup>5</sup>, and <sup>α</sup><sup>2</sup> α<sup>5</sup>. While garbage collection guarantees that for any ground instantiation of the remaining type variables, there exists a valid ground instantiation for α<sup>3</sup>, ExEff would need to be extended with joins (or meets) to express a generically valid instantiation like <sup>α</sup><sup>1</sup> <sup>α</sup><sup>2</sup>. Moreover, we would need additional coercion formers to establish <sup>α</sup><sup>1</sup> - (α<sup>1</sup> <sup>α</sup><sup>2</sup>) or (α<sup>1</sup> <sup>α</sup><sup>2</sup>) α<sup>4</sup>.

As these additional constructs considerably complicate the calculus, we propose a simpler solution. We use ExEff as it is for internal purposes, but display types to programmers in their garbage-collected form.

**Calculi with Explicit Coercions.** The notion of explicit coercions is not new; Mitchell [15] introduced the idea of inserting coercions during type inference for ML-based languages, as a means for explicit casting between different numeric types.

Breazu-Tannen et al. [3] also present a translation of languages with inheritance polymorphism into System F, extended with coercions. Although their coercion combinators are very similar to our coercion forms, they do not include inversion forms, which are crucial for the proof of type safety for our system. Moreover, Breazu-Tannen et al.'s coercions are terms, and thus can not be erased.

Much closer to ExEff is Crary's coercion calculus for inclusive subtyping [4], from which we borrowed the stratification of value results. Crary's system supports neither coercion abstraction nor coercion inversion forms.

System F<sup>C</sup> [25] uses explicit type-equality coercions to encode complex language features (e.g. GADTs [16] or type families [23]). Though ExEff's coercions are proofs of subtyping rather than type equality, our system has a lot in common with it, including the inversion coercion forms and the "push" rules.

**Future Work.** Our plans focus on resuming the postponed work on efficient compilation of handlers. First, we intend to adjust program transformations to the explicit type information. We hope that this will not only make the optimizer more robust, but also expose new optimization opportunities. Next, we plan to write compilers to both Multicore OCaml and standard OCaml, though for the latter, we must first adapt the notion of erasure to a target calculus without algebraic effect handlers. Finally, once the compiler shows promising preliminary results, we plan to extend it to other Eff features such as user-defined types or recursion, allowing us to benchmark it on more realistic programs.

**Acknowledgements.** We would like to thank the anonymous reviewers for careful reading and insightful comments. Part of this work is funded by the Flemish Fund for Scientific Research (FWO). This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-17-1-0326.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Concurrency

### **A Separation Logic for a Promising Semantics**

Kasper Svendsen<sup>1</sup>, Jean Pichon-Pharabod<sup>1</sup>, Marko Doko2(B) , Ori Lahav<sup>3</sup>, and Viktor Vafeiadis<sup>2</sup>

> University of Cambridge, Cambridge, UK MPI-SWS, Kaiserslautern and Saarbr¨ucken, Germany mdoko@mpi-sws.org Tel Aviv University, Tel Aviv, Israel

**Abstract.** We present SLR, the first expressive program logic for reasoning about concurrent programs under a weak memory model addressing the out-of-thin-air problem. Our logic includes the standard features from existing logics, such as RSL and GPS, that were previously known to be sound only under stronger memory models: (1) separation, (2) per-location invariants, and (3) ownership transfer via release-acquire synchronisation—as well as novel features for reasoning about (4) the absence of out-of-thin-air behaviours and (5) coherence. The logic is proved sound over the recent "promising" memory model of Kang et al., using a substantially different argument to soundness proofs of logics for simpler memory models.

### **1 Introduction**

Recent years have seen the emergence of several program logics [2,6,8,16,23,24, 26–28] for reasoning about programs under weak memory models. These program logics are valuable tools for structuring program correctness proofs, and enabling programmers to reason about the correctness of their programs without necessarily knowing the formal semantics of the programming language. So far, however, they have only been applied to relatively strong memory models (such as TSO [19] or release/acquire consistency [15] that can be expressed as a constraint on individual candidate program executions) and provide little to no reasoning principles to deal with C/C++ "relaxed" accesses.

The main reason for this gap is that the behaviour of relaxed accesses is notoriously hard to specify [3,5]. Up until recently, memory models have either been too strong (e.g., [5,14,17]), forbidding some behaviours observed with modern hardware and compilers, or they have been too weak (e.g., [4]), allowing so-called out-of-thin-air (OOTA) behaviour even though it does not occur in practice and is highly problematic.

One observable behaviour forbidden by the strong models is the load buffering behaviour illustrated by the example below, which, when started with both locations x and y containing 0, can end with both r<sup>1</sup> and r<sup>2</sup> containing 1. This behaviour is observable on certain ARMv7 processors after the compiler optimises r<sup>2</sup> + 1 − r<sup>2</sup> to 1.

$$\begin{array}{l} r\_1 := [x]\_{\mathbf{r1x}}; \text{ // reads 1 }\\ [y]\_{\mathbf{r1x}} := r\_1 \end{array} \begin{array}{l} \text{reads 1} \\\ [x]\_{\mathbf{r1x}} := \,^r r\_1 \end{array} \begin{array}{l} \text{reads 1} \\\ [x]\_{\mathbf{r1x}} := r\_2 + 1 - r\_2 \end{array} \text{(LB+data+fakedep)}$$

However, one OOTA behaviour they should not allow is the following example by Boehm and Demsky [5]. When started with two completely disjoint lists a and b, by updating them separately in parallel, it should not be allowed to end with a and b pointing to each other, as that would violate physical separation (for simplicity, in these lists, a location just holds the address of the next element):

$$\begin{array}{lcl}r\_1 := [a]\_{\mathbf{r1x}}; \text{ // reads } b & \left\| \begin{array}{l} r\_2 := [b]\_{\mathbf{r1x}}; \text{ // reads } a\\ [r\_2]\_{\mathbf{r1x}} := b \end{array} \right\| \end{array} \\ \text{(i) reads } a & \text{(Disjoint-Lits)}$$

Because of this specification gap, program logics either do not reason about relaxed accesses, or they assume overly strengthened models that disallow some behaviours that occur in practice (as discussed in Sect. 5).

Recently, there have been several proposals of programming language memory models that allow load buffering behaviour, but forbid obvious out-of-thin-air behaviours [10,13,20]. This development has enabled us to develop a program logic that provides expressive reasoning principles for relaxed accesses, without relying on overly strong models.

In this paper, we present SLR, a separation logic based on RSL [27], extended with strong reasoning principles for relaxed accesses, which we prove sound over the recent "promising" semantics of Kang et al. [13]. SLR features per-location invariants [27] and physical separation [22], as well as novel assertions that we use to show the absence of *OOTA behaviours* and to reason about various *coherence* examples. (Coherence is a property of memory models that requires the existence of a per-location total order on writes that reads respect.)

There are two main contributions of this work.

First, SLR is the first logic which can prove absence of OOTA in all the standard litmus tests. As such, it provides more evidence to the claim that the promising semantics solves the out-of-thin-air problem in a satisfactory way. The paper that introduced the promising semantics [13] comes with three DRF theorems and a simplistic value logic. These reasoning principles are enough to show absence of some simple out-of-thin-air behaviours, but it is still very easy to end up beyond the reasoning power of these two techniques. For instance, they cannot be used to prove that r<sup>1</sup> = 0 in the following "random number generator" litmus test<sup>1</sup>, where both the x and y locations initially hold 0.

$$\begin{array}{lcl}r\_1 := [x]\_{\mathbf{r1x}}; & \left\| \begin{array}{l}r\_2 := [y]\_{\mathbf{r1x}};\\ [x]\_{\mathbf{r1x}} := r\_1 + 1 \end{array} \right\| \ [x]\_{\mathbf{r1x}} := r\_2 \end{array} \tag{\text{RNG}}\text{)}$$

The subtlety of this litmus test is the following: if the first thread reads a certain value v from x, then it writes v + 1 to y, which the second thread can read, and

<sup>1</sup> The litmus test is called this way because some early attempts to solve the OOTA problem allowed this example to return arbitrary values for x and y.

write to x; this, however, does not enable the first thread to read v + 1. SLR features novel assertions that allow it to handle those and other examples, as shown in the following section.

The second major contribution is the proof of soundness of SLR over the promising semantics [13] <sup>2</sup>. The promising semantics is an operational model that represents memory as a collection of timestamped write messages. Besides the usual steps that execute the next command of a thread, the model has a nonstandard step that allows a thread to promise to perform a write in the future, provided that it can guarantee to be able to fulfil its promise. After a write is promised, other threads may read from that write as if it had already happened. Promises allow the load-store reordering needed to exhibit the load buffering behaviour above, and yet seem, from a series of litmus tests, constrained enough so as to not introduce out-of-thin-air behaviour.

Since the promising model is rather different from all other (operational and axiomatic) memory models for which a program logic has been developed, none of the existing approaches for proving soundness of concurrent program logics are applicable to our setting. Two key difficulties in the soundness proof come from dealing with promise steps.


To deal with the first challenge, our proof decouples promising steps from ordinary execution steps. We define two semantics of Hoare triples—one "promising", with respect to the full promising semantics, and one "non-promising", with respect to the promising semantics without promising steps—and prove that every Hoare triple that is correct with respect to its non-promising interpretation is also correct with respect to its promising interpretation. This way, we modularise reasoning about promise steps. Even in the non-promising semantics, however, we do allow threads to have outstanding promises. The main difference in the non-promising semantics is that threads are not allowed to issue new promises.

To resolve the second challenge, we observe that in programs verified by SLR, a thread may promise to write to x only if it is able to acquire the necessary write permission before performing the actual write. This follows from promise

<sup>2</sup> As the promising semantics comes with formal proofs of correctness of all the expected local program transformations and of compilation schemes to the x86-TSO, Power, and ARMv8-POP architectures [21], SLR is sound for these architectures too.

<sup>3</sup> Supporting ownership transfer is necessary to provide useful rules for C11 release and acquire accesses.

```
e ∈ Expr ::= n integer
           | r register
           | e1 op e2 arithmetic
                                   s ∈ Stm ::= skip | s1; s2 | if e then s1 else s2
                                              | while e do s | r := e | r := [e]
                                                                              rlx
                                              | r := [e]
                                                       acq | [e1]rlx := e2 | [e1]rel := e2
```
**Fig. 1.** Syntax of the programming language.

certification: the promising semantics requires all promises to be certifiable; that is, for every state of the promising machine, there must exist a non-promising execution of the machine that fulfils all outstanding promises.

We present the SLR assertions and rules informally in Sect. 2. We then give an overview of the promising semantics of Kang et al. [13] in Sect. 3, and use it in Sect. 4 to explain the proof of soundness of SLR. We discuss related work in Sect. 5. *Details of the rules of SLR and its soundness proof can be found in our technical appendix* [1].

### **2 Our Logic**

The novelty of our program logic is to allow non-trivial reasoning about relaxed accesses. Unlike release/acquire accesses, relaxed accesses do not induce synchronisation between threads, so the usual approach of program logics, which relies on ownership transfer, does not apply. Therefore, in addition to reasoning about ownership transfer like a standard separation logic, our logic supports reasoning about relaxed accesses by collecting information about what reads have been observed, and in which order. When combined with information about which writes have been performed, we can deduce that certain executions are impossible.

For concreteness, we consider a minimal "WHILE" programming language with expressions, e ∈ *Expr*, and statements, s ∈ *Stm*, whose syntax is given in Fig. 1. Besides local register assignments, statements also include memory reads with relaxed or acquire mode, and memory writes with relaxed or release mode.

### **2.1 The Assertions of the Logic**

The SLR assertion language is generated by the following grammar, where N, l, v, t, π and X all range over a simply-typed term language which we assume includes booleans, locations, values and expressions of the programming language, fractional permissions, and timestamps, and is closed under pairing, finite sets, and sequences. By convention, we assume that l, v, t, π and X range over terms of type location, value, timestamp, permission and sets of pairs of values, and timestamps, respectively.

$$\begin{array}{c} P, Q \in Assn ::= \bot \mid \top \mid P \lor Q \mid P \land Q \mid P \Rightarrow Q \mid \forall x. P \mid \exists x. P \mid N\_1 = N\_2 \mid \phi(N) \\\mid P \* Q \mid \mathsf{Rel}(l, \phi) \mid \mathsf{Acc}(l, \phi) \mid \mathsf{O}(l, v, t) \mid \mathsf{W}^\pi(l, X) \mid \nabla P \\\phi \in Pred ::= \lambda x. P \end{array}$$

The grammar contains the standard operators from first order logic and separation logic, the Rel and Acq assertions from RSL [27], and a few novel constructs.

Rel(l, φ) grants permission to perform a release write to location l and transfer away the invariant φ(v), where v is the value written to that location. Conversely, Acq(l, φ) grants permission to perform an acquire read from location l and gain access to the invariant φ(v), where v is the value returned by the read.

The first novel assertion form, O(l, v, t), records the fact that location l was observed to have value v at timestamp t. The timestamp is used to order it with other reads from the same location. The information this assertion provides is very weak: it merely says that the owner of the assertion has observed that value, it does not imply that any other thread has ever observed it.

The other novel assertion form, W<sup>π</sup>(l, X), asserts ownership of location l and records a set of writes X to that location. The fractional permission π ∈ Q indicates whether ownership is shared or exclusive. Full permission, π = 1, confers exclusive ownership of location l and ensures that X is the set of all writes to location l; any fraction, 0 <π< 1, confers shared ownership and enforces that X is a lower-bound on the set of writes to location l. The order of writes to l is tracked through timestamps; the set X is thus a set of pairs consisting of the value and the timestamp of the write.

In examples where we only need to refer to the order of writes and not the exact timestamps, we write W<sup>π</sup>(x, ), where = [v1, ..., vn] is a list of values, as shorthand for <sup>∃</sup>t1, ..., tn. t<sup>1</sup> > t<sup>2</sup> <sup>&</sup>gt; ··· > t<sup>n</sup> <sup>∗</sup> <sup>W</sup><sup>π</sup>(x, {(v1, t1), ...,(vn, tn)}). The W<sup>π</sup>(x, ) assertion thus expresses ownership of location x with permission π, and that the writes to x are given by the list in order, with the most recent write at the front of the list.

*Relation Between Reads andWrites.* Records of reads and writes can be confronted by the thread owning the exclusive write assertion: all reads must have read values that were written. This is captured formally by the following property:

$$\mathcal{W}^1(x,X) \ast \mathcal{O}(x,a,t) \Rightarrow \mathcal{W}^1(x,X) \ast \mathcal{O}(x,a,t) \ast (a,t) \in X \quad \text{(Reads-from-Write)}$$

*Random Number Generator.* These assertions allow us to reason about the "random number generator" litmus test from the Introduction, and to show that it cannot read arbitrarily large values. As discussed in the Introduction, capturing the set of values that are written to x, as made possible by the "invariant-based program logic" of Kang et al. [13, Sect. 5.5] and of Jeffrey and Riley [10, Sect. 6], is not enough, and we make use of our stronger reasoning principles. We use O(x, a, t) to record what values reads read from each location, and W<sup>1</sup>(x, ) to record what sequences of values were written to each location, and then confront these records at the end of the execution. The proof sketch is then as follows:

$$\begin{array}{l} \{\mathsf{W}^{1}(y,[0])\*\ldots\} \\ r\_{1} := [x]\_{\mathtt{r1x}}; \\ \{\mathsf{W}^{1}(y,[0])\*\mathtt{O}(x,r\_{1},\ldots)\*\ldots\} \\ [y]\_{\mathtt{r1x}} := r\_{1} + 1 \\ \{\mathsf{W}^{1}(y,[r\_{1}+1;0])\*\mathtt{O}(x,r\_{1},\ldots)\*\ldots\} \end{array} \quad \left\| \begin{array}{l} \{\mathsf{W}^{1}(x,[0])\*\ldots\} \\ r\_{2} := [y]\_{\mathtt{r1x}}; \\ \{\mathsf{W}^{1}(x,[0])\*\mathtt{O}(y,r\_{2},\ldots)\*\ldots\} \\ [x]\_{\mathtt{r1x}} := r\_{2} \\ \{\mathsf{W}^{1}(x,[r\_{2};0])\*\mathtt{O}(y,r\_{2},\ldots)\*\ldots\} \end{array} \right\} \right\| \quad \left\|\begin{array}{l} \{\mathsf{W}^{1}(x,[0])\*\mathtt{O}(y,r\_{2},\ldots)\*\ldots\} \\ [x]\_{\mathtt{r1x}} := [y]\_{\mathtt{r1x}}; \\ \{\mathsf{W}^{1}(x,[r\_{2};0])\*\mathtt{O}(y,r\_{2},\ldots)\*\ldots\} \end{array} \right\| \right\| $$

At the end of the execution, we are able to draw conclusions about the values of the registers. From <sup>W</sup><sup>1</sup>(x, [r2; 0]) and <sup>O</sup>(x, r1, ), we know that <sup>r</sup><sup>1</sup> ∈ {r2, <sup>0</sup>} by rule Reads-from-Write. Similarly, we know that r<sup>2</sup> ∈ {r<sup>1</sup> + 1, 0}, and so we can conclude that r<sup>1</sup> = 0. We discuss the distribution of resources at the beginning of a program, and their collection at the end of a program, in Theorem 2. Note that we are unable to establish what values the reads read before the end of the litmus test. Indeed, before the end of the execution, nothing enforces that there are no further writes that reads could read from.

### **2.2 The Rules of the Logic for Relaxed Accesses**

We now introduce the rules of our logic by focusing on the rules for *relaxed* accesses. In addition, we support the standard rules from separation logic and Hoare logic, rules for release/acquire accesses (Sect. 2.4), and the following consequence rule:

$$\frac{P \Rightarrow P' \quad \left\{ P' \right\} c \left\{ Q' \right\} \quad Q' \Rightarrow Q}{\vdash \left\{ P \right\} c \left\{ Q \right\}}\tag{\text{consEq}}$$

which allows one to use "view shifting" implications to strengthen the precondition and weaken the postcondition.

The rules for relaxed accesses are adapted from the rules of RSL [27] for release/acquire accesses, but use our novel resources to track the more subtle behaviour of relaxed accesses. Since relaxed accesses do not introduce synchronisation, they cannot be used to transfer ownership; they can, however, be used to transfer information. For this reason, as in RSL [27], we associate a predicate φ on values to a location x using paired Rel(x, φ) and Acq(x, φ) resources, for writers and readers, respectively. To write v to x, a writer has to provide φ(v), and in exchange, when reading v from x, a reader obtains φ(v). However, here, relaxed writes can only send *pure* predicates (i.e., ones which do not assert ownership of any resources), and relaxed reads can only obtain the assertion from the predicate guarded by a modality <sup>∇</sup><sup>4</sup> that only pure assertions filter through: if P is pure, then ∇P =⇒ P. All assertions expressible in first-order logic are pure.

*Relaxed Write Rule.* To write value v (to which the value expression e<sup>2</sup> evaluates) to location x (to which the location expression e<sup>1</sup> evaluates), the thread needs to own a write permission W<sup>π</sup>(x, X). Moreover, it needs to provide φ(v), the assertion associated to the written value, v, to location x by the Rel(x, φ) assertion. Because the write is a relaxed write, and therefore does not induce synchronisation, φ(v) has to be a pure predicate. The write rule updates the record of writes with the value written, timestamped with a timestamp newer than any timestamp for that location that the thread has observed so far; this is expressed by relating it to a previous timestamp that the thread has to provide through an O(x, , t) assertion in the precondition.

<sup>4</sup> This <sup>∇</sup> modality is similar in spirit, but weaker than that of FSL [8].

$$\frac{\phi(v) \text{ is pure}}{\vdash \begin{cases} e\_1 = x \ast e\_2 = v \ast \mathsf{W}^\pi(x, X) \\ \ast \mathsf{Rel}(x, \phi) \ast \phi(v) \ast \mathsf{O}(x, \Box, t) \end{cases} [e\_1]\_{\mathsf{r1x}} := e\_2 \begin{cases} \exists t' > t. \\ \mathsf{W}^\pi(x, \{(v, t')\} \cup X) \end{cases}}{\left\{ \begin{aligned} \mathsf{W}^\pi(x, \{(v, t')\} \cup X) \end{aligned} \right\}} $$

The Rel(x, φ) assertion is duplicable, so there is no need for the rule to keep it.

In practice, O(x, , t) is taken to be that of the last read from x if it was the last operation on x, and O(x, *fst*(max(X)), *snd*(max(X))) if the last operation on x was a write, including the initial write. The latter can be obtained by

$$\mathcal{W}^{\pi}(x,X) \* (v,t) \in X \Rightarrow \mathcal{W}^{\pi}(x,X) \* \mathcal{O}(x,v,t) \tag{\text{Write-Observed}}$$

*Relaxed Read Rule.* To read from location x (to which the location expression e evaluates), the thread needs to own an Acq(x, φ) assertion, which gives it the right to (almost) obtain assertion φ(v) upon reading value v from location x. The thread then keeps its Acq(x, φ), and obtains an assertion O(x, r, t ) stating that it has read the value now in register r from location x, timestamped with t . This timestamp is no older than any timestamp for that location that the thread has observed so far, expressed again by relating it to an O(x, , t) assertion in the precondition. Moreover, it obtains the pure portion ∇φ(r) of the assertion φ(r) corresponding to the value read in register r

$$\begin{aligned} \vdash \left\{ e = x \ast \mathsf{Acq}(x, \phi) \ast \mathsf{O}(x, \Box, t) \right\} \\ r := [e]\_{\mathsf{r1x}} \\ \left\{ \exists t' \ge t. \ \mathsf{Acq}(x, \phi) \ast \mathsf{O}(x, r, t') \ast \nabla \phi(r) \right\} \end{aligned} \tag{R-\text{RLX}}$$

Again, we can obtain O(x, v<sup>x</sup> <sup>0</sup> , 0), where v<sup>x</sup> <sup>0</sup> is the initial value of x, from the initial write permission for x, and distribute it to all the threads that will read from x, expressing the fact that the initial value is available to all threads, and use it as the required O(x, , t) in the precondition of the read rule.

Moreover, if a thread owns the exclusive write permission for a location x, then it can take advantage of the fact that it is the only writer at that location to obtain more precise information about its reads from that location: they will read the last value it has written to that location.

$$\begin{aligned} \vdash \left\{ e = x \ast \mathsf{Acq}(x, \phi) \ast \mathsf{W}^1(x, X) \right\} \\ r &:= [e]\_{\mathsf{r1x}} \\ \left\{ \exists t. \left( r, t \right) = \max(X) \ast \mathsf{Acq}(x, \phi) \ast \mathsf{W}^1(x, X) \ast \mathsf{O}(x, r, t) \ast \nabla \phi(r) \right\} \end{aligned} \tag{R-\text{RL}X}^\# $$

*Separation.* With these assertions, we can straightforwardly specify and verify the Disjoint-Lists example. Ownership of an element of a list is simply expressed using a full write permission, W<sup>1</sup>(x, X). This allows including the Disjoint-Lists as a snippet in a larger program where the lists can be shared before or after, and still enforce the separation property we want to establish. While this reasoning sounds underwhelming (and we elide the details), we remark that it is unsound in models that allow OOTA behaviours.

### **2.3 Reasoning About Coherence**

An important feature of many memory models is coherence, that is, the existence of a per-location total order on writes that reads respect. Coherence becomes interesting where there are multiple simultaneous writers to the same location (write/write races). In our logic, write assertions can be split and combined as follows: if π<sup>1</sup> + π<sup>2</sup> ≤ 1, 0 < π<sup>1</sup> and 0 < π<sup>2</sup> then

$$\mathcal{W}^{\pi\_1+\pi\_2}(x, X\_1 \cup X\_2) \Leftrightarrow \mathcal{W}^{\pi\_1}(x, X\_1) \ast \mathcal{W}^{\pi\_2}(x, X\_2) \tag{Combine-Witness}$$

To reason about coherence, the following rules capture the fact that the timestamps of the writes at a given location are all distinct, and totally ordered:

$$\begin{aligned} \mathsf{W}^{\pi}(x, X) \* (v, t) \in X \* (v', t') \in X \* v \neq v' \Rightarrow \mathsf{W}^{\pi}(x, X) \* t \neq t' \text{ (Different-Wirtes)}\\ \mathsf{W}^{\pi}(x, X) \* (\Box, t) \in X \* (\Box, t') \in X \Rightarrow \mathsf{W}^{\pi}(x, X) \* (t < t' \lor t = t' \lor t' < t) \text{ (Writes-Ordereel)} \end{aligned}$$

*CoRR2.* One of the basic tests of coherence is the CoRR2 litmus test, which tests whether two threads can disagree on the order of two writes to the same location. The following program, starting with location x holding 0, should not be allowed to finish with r<sup>1</sup> = 1 ∗ r<sup>2</sup> = 2 ∗ r<sup>3</sup> = 2 ∗ r<sup>4</sup> = 1, as that would mean that the third thread sees the write of 1 to x before the write of 2 to x, but that the fourth thread sees the write of 2 before the write of 1:

$$\begin{array}{c} [x]\_{\mathbf{r1x}} := 1 \quad \left\| \begin{array}{c} [x]\_{\mathbf{r1x}} := 2 \end{array} \right\| \begin{array}{c} r\_1 := [x]\_{\mathbf{r1x}}; \\ r\_2 := [x]\_{\mathbf{r1x}} \end{array} \right\| \begin{array}{c} r\_3 := [x]\_{\mathbf{r1x}}; \\ r\_4 := [x]\_{\mathbf{r1x}} \end{array} \end{array} \tag{CoRR2}$$

Coherence enforces a total order on the writes to x that is respected by the reads, so if the third thread reads 1 then 2, then the fourth cannot read 2 then 1.

We use the timestamps in the O(x, a, t) assertions to record the order in which reads read values, and then link the timestamps of the reads with those of the writes. Because we do not transfer anything, the predicate for x is λv. again, and we elide the associated clutter below.

The proof outline for the writers just records what values have been written:

$$\begin{array}{l} \{\mathsf{W}^{1/2}(x,\{(0,0)\})\*\ldots\} \\ [x]\_{\mathsf{r1x}} := 1 \\ \{\exists t\_1.\mathsf{W}^{1/2}(x,\{(1,t\_1),(0,0)\})\*\ldots\} \end{array} \left\| \begin{array}{l} \{\mathsf{W}^{1/2}(x,\{(0,0)\})\*\ldots\} \\ [x]\_{\mathsf{r1x}} := 2 \\ \{\exists t\_2.\mathsf{W}^{1/2}(x,\{(2,t\_2),(0,0)\})\*\ldots\} \end{array} \right\| $$

The proof outline for the readers just records what values have been read, and—crucially—in which order.

$$\begin{array}{l} \{\mathsf{Acq}(x,\lambda v.\top)\*\mathsf{O}(x,0,0)\} \\ r\_1 := [x]\_{\mathtt{r1x}}; \\ \{\exists t\_a.\mathsf{Acq}(x,\lambda v.\top)\*\mathsf{O}(x,r\_1,t\_a)\*0 \leq t\_a\*\dots\*\} \\ r\_2 := [x]\_{\mathtt{r1x}} \\ \{\exists t\_a.t\_b.\mathsf{O}(x,r\_1,t\_a)\*\mathsf{O}(x,r\_2,t\_b)\*0 \leq t\_a\*t\_a \leq t\_b\} \end{array} \left| \begin{array}{l} \\ r\_3 := [x]\_{\mathtt{r1x}}; \\ r\_4 := [x]\_{\mathtt{r1x}} \end{array} \right| $$

At the end of the program, by combining the two write permissions using rule Combine-Writes, we obtain <sup>W</sup><sup>1</sup>(x, {(1, t1),(2, t2),(0, 0)}). From this, we have t<sup>1</sup> < t<sup>2</sup> or t<sup>2</sup> < t<sup>1</sup> by rules Different-Writes and Writes-Ordered. Now, assuming r<sup>1</sup> = 1 and r<sup>2</sup> = 2, we have t<sup>a</sup> < tb, and so t<sup>1</sup> < t<sup>2</sup> by rule Reads-from-Write. Similarly, assuming r<sup>3</sup> = 2 and r<sup>4</sup> = 1, we have t<sup>2</sup> < t1. Therefore, we cannot have r<sup>1</sup> = 1 ∗ r<sup>2</sup> = 2 ∗ r<sup>3</sup> = 2 ∗ r<sup>4</sup> = 1, so coherence is respected, as desired.

#### **2.4 Handling Release and Acquire Accesses**

Next, consider release and acquire accesses, which, in addition to coherence, provide synchronisation and enable the message passing idiom.

$$\begin{array}{lcl} [x]\_{\mathbf{r1x}} := 1; & \left[ \begin{array}{l} r\_1 := [y]\_{\mathbf{acq}}; \\ \text{if } r\_1 = 1 \text{ then } r\_2 := [x]\_{\mathbf{r1x}} \end{array} \right. \\ \end{array} \tag{\text{MP}}$$

The first thread writes data (here, 1) to a location x, and signals that the data is ready by writing 1 to a "flag" location y with a release write. The second thread reads the flag location y with an acquire read, and, if it sees that the first thread has signalled that the data has been written, reads the data. The release/acquire pair is sufficient to ensure that the data is then visible to the second thread.

Release/acquire can be understood abstractly in terms of views [15]: a release write contains the view of the writing thread at the time of the writing, and an acquire read updates the view of the reading thread with that of the release write it is reading from. This allows one-way synchronisation of views between threads.

To handle release/acquire accesses in SLR, we can adapt the rules for relaxed accesses by enabling ownership transfer according to predicate associated with the Rel and Acq permissions. The resulting rules are strictly more powerful than the corresponding RSL [27] rules, as they also allow us to reason about coherence.

*Release Write Rule.* The release write rule is the same as for relaxed writes, but does not require the predicate to be a pure predicate, thereby allowing sending of actual resources, rather than just information:

$$\begin{aligned} \vdash \left\{ e\_1 = x \ast e\_2 = v \ast \mathsf{W}^\pi(x, X) \ast \mathsf{Rel}(x, \phi) \ast \phi(v) \ast \mathsf{O}(x, \Box, t) \right\} \\ [e\_1]\_{\mathsf{real}} := e\_2 \\ \{\exists t' \ge t. \ \mathsf{W}^\pi(x, \{(v, t')\} \cup X)\} \end{aligned} \tag{W-\text{REL}}$$

*Acquire Read Rule.* Symmetrically, the acquire read rule is the same as for relaxed reads, but allows the actual resource to be obtained, not just its pure portion:

$$\begin{aligned} \vdash \left\{ e = x \ast \mathsf{Acq}(x, \phi) \ast \mathsf{O}(x, \Box, t) \right\} \\ r := [e]\_{\mathsf{acq}} \\ \{ \exists t' \ge t. \; \mathsf{Acq}(x, \phi[r \mapsto \top]) \ast \mathsf{O}(x, r, t') \ast \phi(r) \} \end{aligned} \tag{R-\text{ACQ}}$$

We have to update φ to record the fact that we have obtained the resource associated with reading that value, so that we do not erroneously obtain that resource twice; φ[v → P] stands for λv. *if* v = v *then* P *else* φ(v).

As for relaxed accesses, we can strengthen the read rule when the reader is also the exclusive writer to that location:

$$\begin{array}{l} \vdash \left\{ \mathsf{Acq}(x,\phi) \* \mathsf{W}^{1}(x,X) \right\} \\ r := [x]\_{\mathsf{acq}} \\ \left\{ \exists t. \left(r,t\right) = \max(X) \* \mathsf{Acq}(x,\phi[r \mapsto \top]) \right\} \\ \ast \mathsf{W}^{1}(x,X) \* \mathsf{O}(x,r,t) \* \phi(r) \end{array} \tag{R-\mathsf{ACQ}^{\ast}} \dagger$$

Additionally, we allow duplicating of release assertions and splitting of acquire assertions, as expressed by the following two rules.

$$\mathsf{Rel}(x,\phi)\Leftrightarrow\mathsf{Rel}(x,\phi)\*\mathsf{Rel}(x,\phi)\tag{\mathsf{Release-Duplicate}}$$

$$\mathsf{Acq}(x,\lambda v.\phi\_1(v)\*\phi\_2(v))\Rightarrow\mathsf{Acq}(x,\phi\_1)\*\mathsf{Acq}(x,\phi\_2)\tag{\mathsf{Acquire-Split}}$$

*Message Passing.* With these rules, we can easify verify the message passing example. Here, we want to transfer a resource from the writer to the reader, namely the state of the data, x. By transferring the write permission for the data to the reader over the "flag" location, y, we allow the reader to use it to read the data precisely. We do that by picking the predicate

$$\phi\_y = \lambda v. \ v = 1 \land \mathsf{W}^1(x, [1; 0]) \lor v \neq 1.$$

for y. Since we do not transfer any resource using x, the predicate for x is λv. .

The writer transfers the write permissions for x away on y using φy:

$$\begin{array}{l} \{\mathsf{W}^{1}(x,[0]) \ast \mathsf{Rel}(x,\lambda v.\top) \ast \mathsf{W}^{1}(y,[0]) \ast \mathsf{Rel}(y,\phi\_{y})\} \\ [x]\_{\mathsf{r1x}} := 1; \\ \{\mathsf{W}^{1}(x,[1;0]) \ast \mathsf{W}^{1}(y,[0]) \ast \mathsf{Rel}(y,\phi\_{y})\} \\ \{\mathsf{W}^{1}(y,\{(0,0)\}) \ast \mathsf{Rel}(y,\phi\_{y}) \ast \phi\_{y}(1) \ast \mathsf{O}(x,0,0)\} \\ [y]\_{\mathsf{r1x}} := 1 \\ \{\exists t\_{1}.\mathsf{W}^{1}(y,\{(1,t\_{1})\}) \cup \{(0,0)\}) \ast 0 < t\_{1} \\ \{\mathsf{W}^{1}(y,[1;0]) \ast \mathsf{Rel}(y,\phi\_{y})\} \end{array}$$

The proof outline for the reader uses the acquire permission φ<sup>y</sup> for y to obtain W<sup>1</sup>(x, [1; 0]), which it then uses to know that it reads 1 from x.

$$\begin{cases} \mathsf{Acq}(y, \phi\_y)) \* \mathsf{O}(y, 0, 0) \* \mathsf{Acq}(x, \lambda v. \top) \\ r\_1 := [y]\_{\mathsf{aeq}}; \\ \left\{ \exists t\_1^y \ge 0. \mathsf{Acq}(y, \phi\_y[r\_1 \mapsto \top]) \* \mathsf{O}(y, r\_1, t\_1^y) \* \phi\_y(r\_1) \* \mathsf{Acq}(x, \lambda v. \top) \right\} \\ \phi\_y(r\_1) \* \mathsf{Acq}(x, \lambda v. \top) \Big\} \\ \text{if } r\_1 = 1 \text{ then} \\ \left\{ \mathsf{W}^1(x, [1; 0]) \* \mathsf{Acq}(x, \lambda v. \top) \right\} \\ r\_2 := [x]\_{\mathsf{r1x}} \\ \left\{ \mathsf{Acq}(x, \lambda v. \top) \* \mathsf{W}^1(x, [1; 0]) \* (r\_2 = 1) \right\} \\ r\_1 = 1 \implies r\_2 = 1 \end{cases}$$

#### **2.5 Plain Accesses**

Our formal development (in the technical appendix) also features the usual "partial ownership" x <sup>π</sup> → v assertion for "plain" (non-atomic) locations, and the usual corresponding rules.

### **3 The Promising Semantics**

In this section, we provide an overview of the promising semantics [13], the model for which we prove SLR sound. Formal details can be found in [1,13].

The promising semantics is an operational semantics that interleaves execution of the threads of a program. Relaxed behaviour is introduced in two ways:


The behaviour of promising steps can be illustrated on the LB+data+fakedep litmus test from the Introduction. The second thread can, at the very start of the execution, promise a write of 1 to x, because it can, by running on its own from the current state, read from y (it will read 0), then write 1 to x (because 0+1 − 0 = 1), thereby fulfilling its promise. On the other hand, the first thread cannot promise a write of 1 to y at the beginning of the execution, because, by running on its own, it can only read 0 from x, and therefore only write 0 to y.

#### **3.1 Storage Subsystem**

Formally, the semantics keeps track of writes and promises in a *global configuration*, *gconf* = M,P, where M is a memory and P ⊆ M is the *promise memory*. We denote by *gconf* .M and *gconf* .P the components of *gconf* . Both *memories* are finite sets of messages, where a *message* is a tuple x : o <sup>i</sup> v, R@t], where x ∈ *Loc* is the location of the message, v ∈ *Val* its value, i ∈ *Tid* its originating thread, <sup>t</sup> <sup>∈</sup> *Time* its *timestamp*, <sup>R</sup> its message *view*, and <sup>o</sup> ∈ {rlx, rel} its message mode, where *Time* is an infinite set of timestamps, densely totally ordered by <sup>≤</sup>, with a minimum element, 0. (We return to views later.) We denote m.loc, m.val, m.time, m.view and m.mod the components of a message <sup>m</sup>. We use the following notation to restrict memories:

$$\begin{aligned} M(i) & \stackrel{\text{def}}{=} \{ m \in M \mid m.\mathbf{ti} \mathbf{d} = i \} & M(\mathbf{re1} \mathbf{1}) & \stackrel{\text{def}}{=} \{ m \in M \mid m.\mathbf{n} \mathbf{od} = \mathbf{re1} \}, \\ M(x) & \stackrel{\text{def}}{=} \{ m \in M \mid m.\mathbf{1o} \mathbf{c} = x \} & M(\mathbf{r1x}) & \stackrel{\text{def}}{=} \{ m \in M \mid m.\mathbf{n} \mathbf{od} = \mathbf{r1x} \}, \\ M(i, x) & \stackrel{\text{def}}{=} M(i) \cap M(x) \end{aligned}$$

A global configuration *gconf* evolves in two ways. First, a message can be "promised" and be added both to *gconf*.M and *gconf*.P. Second, a message can be written, in which case it is either added to *gconf*.M, or removed from *gconf*.P (if it was promised before).

### **3.2 Thread Subsystem**

A *thread state* is a pair *TS* = σ, V , where σ is the internal state of the thread and V is a *view*. We denote by *TS*.σ and *TS*.V the components of *TS*.

*Thread Internal State.* The internal state σ consists of a thread store (denoted σ.μ) that assigns values to local registers and a statement to execute (denoted σ.s). The transitions of the thread internal state are labeled with *memory actions* and are given by an ordinary sequential semantics. As these are routine, we leave their description to the technical appendix.

*Views.* Thread views are used to enforce coherence, that is, the existence of a per-location total order on writes that reads respect. A view is a function V : *Loc* → *Time*, which records how far the thread has seen in the history of each location. To ensure that a thread does not read stale messages, its view restricts the messages the thread may read, and is increased whenever a thread observes a new message. Messages themselves also carry a view (the thread's view when the message comes from a release write, and the bottom view otherwise) which is incorporated in the thread view when the message is read by an acquire read.

*Additional Notations.* The order on timestamps, ≤, is extended pointwise to views. ⊥ and denote the natural bottom elements and join operations for views. {x@t} denotes the view assigning t to x and 0 to other locations.

### **3.3 Interaction Between a Thread and the Storage Subsystem**

The interaction between a thread and the storage subsystem is given in terms of transitions of *thread configurations*. Thread configurations are tuples *TS*,M,P, where *TS* is a thread state, and M,P is a global configuration. These transitions are labelled with β ∈ {NP, prom} in order to distinguish whether they involve promises or not. A thread can:


### **3.4 Constraining Promises**

Now that we have described how threads and promises interact with memory, we can present the certification condition for promises, which is essential to avoid out-of-thin-air behaviours. Accordingly, we define another transition system, =⇒, on top of the previous one, which enforces that the memory remains "consistent", that is, all the promises that have been made can be certified. A thread configuration *TS*,M,P is called *consistent* w.r.t. i ∈ *Tid* if thread i can fulfil its promises by executing on its own, or more formally if *TS*,M,P NP −→<sup>∗</sup> <sup>i</sup> *TS* ,M , P for some *TS* , M , P such that P (i) = ∅. Certification is *local*, that is, only thread i is executing during its certification; this is crucial to avoid out-of-thin-air. Further, the certification itself cannot make additional promises, as it is restricted to NP-steps. Here is a visual representation of a promise machine run, together with certifications.

The thread configuration =⇒-transitions allow a thread to (1) take any number of non-promising steps, provided its thread configuration at the end of the sequence of step (intuitively speaking, when it gives control back to the scheduler) is consistent, or (2) take a promising step, again provided that its thread configuration after the step is consistent.

#### **3.5 Full Machine**

Finally, the full machine transitions simply lift the thread configuration =⇒ transitions to the machine level. A *machine state* is a tuple **MS** = TS,M,P, where TS is a function assigning a thread state *TS* to every thread, and M,P is a global configuration. The initial state **MS**<sup>0</sup> (for a given program) consists of the function TS<sup>0</sup> mapping each thread <sup>i</sup> to its initial state σ<sup>0</sup> <sup>i</sup> , ⊥, where <sup>σ</sup><sup>0</sup> i is the thread's initial local state and ⊥ is the zero view (all timestamps in views are 0); the initial memory <sup>M</sup><sup>0</sup> consisting of one message <sup>x</sup> : rlx <sup>0</sup> 0, ⊥@0] for each location x; and the empty set of promises.

### **4 Semantics and Soundness**

In this section, we present the semantics of SLR, and give a short overview of the soundness proof. Our focus is not on the technical details of the proof, but on the two main challenges in defining the semantics and proving soundness:


### **4.1 The Intuition**

SLR assertions are interpreted by (sets of) *resources*, which represent permissions to write to a certain location and/or to obtain further resources by reading a certain message from memory. As is common in semantics of separation logics, the resources form a partial commutative monoid, and SLR's separating conjunction is interpreted as the composition operation of the monoid.

When defining the meaning of a Hoare triple {P} s {Q}, we think of the promise machine as if it were manipulating resources: each thread owns some resources and operates using them. The intuitive description of the Hoare triple semantics is that every run of the program s starting from a state containing the resources described by the precondition, P, will be "correct" and, if it terminates, will finish in a state containing the resources described by the postcondition, Q. The notion of a program running correctly can be described in terms of threads "respecting" the resources they own; for example, if a thread is executing a write or fulfilling a promise, it should own a resource representing the write permission.

#### **4.2 A Closer Look at the Resources and the Assertion Semantics**

We now take a closer look at the structure of resources and the semantics of assertions, whose formal definitions can be found in Figs. 2 and 3.

The idea is to interpret assertions as predicates over triples consisting of memory, a view, and a resource. We use the resource component to model assertions involving ownership (i.e., write assertions and acquire assertions), and model other assertions using the memory and view components. Once a resource is no longer needed, SLR allows us to drop these from assertions: P ∗ Q ⇒ P. To model this we interpret assertions as upwards-closed predicates, that may own more than explicitly asserted. The ordering on memories and views is given by the promising semantics, and the ordering on resources is induced by the composition operation in the resource monoid. For now, we leave the resource composition unspecified, and return to it later.

$$\begin{array}{c} \iota \in PredId \stackrel{\text{def}}{=} \mathbb{N} \quad \text{(preplicate identities)}\\Perm \stackrel{\text{def}}{=} \{\pi \in \mathbb{Q} \mid 0 \le \pi \le 1\} \quad \text{(fractional permutations)}\\Write \stackrel{\text{def}}{=} \mathcal{P}(Val \times Time) \\ WrPerm \stackrel{\text{def}}{=} Loc \to \{ (\pi, X) \in Perm \times Write \mid \pi = 0 \Rightarrow X = \emptyset \} \\ AcqPerm \stackrel{\text{def}}{=} Loc \to \mathcal{P}(PredId) \\ r = (r. \bullet \mathbf{r}, r. \mathbf{a} \mathbf{c}) \in Res \stackrel{\text{def}}{=} WrPerm \times AcqPerm \quad \text{(resoures)} \\ \mathcal{W} = (\mathcal{W}. \mathtt{r} \mathtt{e1}, \mathcal{W}. \mathtt{a} \mathbf{c}) \in World \stackrel{\text{def}}{=} (Loc \to Pred) \times (PredId \to\_{\text{fin}} Pred) \quad \text{(worlds)} \\ Prop \stackrel{\text{def}}{=} word \to \bigcup\_{\text{mon}} \mathcal{P}^{\uparrow}(Mem \times View \times Res) \end{array}$$

**Fig. 2.** Semantic domains used in this section.

In addition, however, we have to deal with assertions that are parametrised by predicates (in our case, Rel(x, φ) and Acq(x, φ)). Doing so is not straightforward because na¨ıve attempts of giving semantics to such assertions result in circular definitions. A common technique for avoiding this circularity is to treat predicates stored in assertions syntactically, and to interpret assertions relative to a *world*, which is used to interpret those syntactic predicates. In our case, worlds consist of two components: the *WrPerm* component associates a syntactic SLR predicate with every location (this component is used to interpret release permissions), while the *AcqPerm* component associates a syntactic predicate with a finite number of currently allocated predicate identifiers (this component is used to interpret acquire permissions). The reason for the more complex structure for acquire permissions is that they can be split (see (Acquire-Split)). Therefore, we allow multiple predicate identifiers associated with a single location. When acquire permissions are divided and split between threads, new predicate identifiers are allocated and associated with predicates in the world. The world ordering, W<sup>1</sup> ≤ W2, expresses that world W<sup>2</sup> is an extension of W<sup>1</sup> in which new predicate identifiers may have been allocated, but all existing predicate identifiers are associated with the same predicates.

Let us now focus our attention on the assertion semantics. The semantics of assertions, -P η <sup>μ</sup>, is relative to a thread store μ that assigns values to registers, and an environment η that assigns values to logical variables.

The standard logical connectives and quantifiers are interpreted following their usual intuitionistic semantics. The semantics of our novel assertions is given in Fig. 3 and can be explained as follows:

– The observed assertion O(x, v, t) says that the memory contains a message at location x with value v and timestamp t, and the current thread knows about it (i.e., the thread view contains it).


Note that W<sup>π</sup>(x, X), Acq(x, φ), and Rel(x, φ) only talk about owning certain resources, and do not constrain the memory itself at all. In the next subsection, we explain how we relate the abstract resources with the concrete machine state.

**Fig. 3.** Interpretation of SLR assertions, - η <sup>μ</sup> : *Assn* → *Prop*

### **4.3 Relating Concrete State and Resources**

Before giving a formal description of the relationship between abstract resources and concrete machine states, we return to the intuition of threads manipulating resources presented in Sect. 4.1.

Consider what happens when a thread executes a release write to a location x. At that point, the thread has to own a release resource represented by Rel(x, φ), and to store the value v, it has to own the resources represented by φ(v). As the write is executed, the thread gives up the ownership of the resources corresponding to φ(v). Conversely, when a thread that owns the resource represented by Acq(x, φ) performs an acquire read of a value v from location x, it will gain ownership of resources satisfying φ(v). However, this picture does not account for *what happens to the resources that are "in flight"*, i.e., the resources that have been released, but not yet acquired.

Our approach is to associate in-flight resources to messages in the memory. When a thread does a release write, it attaches the resources it released to the message it just added to the memory. That way, a thread performing an acquire read from that message can easily take ownership of the resources that are associated to the message. Formally, as the execution progresses, we update the assignment of resources to messages,

$$u \colon M(\mathbf{re1}) \to (PredId \to Res).$$

For every release message in memory M, the message resource assignment u gives us a mapping from predicate identifiers to resources. Here, we again use predicate identifiers to be able to track which acquire predicate is being satisfied by which resource. The intended reading of u(m)(ι) = r is that the resource r attached to the message m satisfies the predicate with the identifier ι.

We also require that the resources attached to a message (i.e., the resources released by the thread that wrote the message) suffice to satisfy all the acquire predicates associated with that particular location. Together, these two properties of our message resource assignment, as formalised in Fig. 4, allow us to describe the release/acquire ownership transfer.

$$\begin{array}{l} M \mid \vdash r, u, \mathcal{W} \stackrel{\text{def}}{=} \\ \forall m \in M(\mathtt{re1}). \; r. \mathsf{accq}(m. \mathsf{1cc}) = dom(u(m)) \\ \wedge \forall \iota \in dom(u(m)). \\ (M, m. \mathsf{riew}, u(m)(\iota)) \in [\mathcal{W}. \mathsf{acq}(\iota)(m. \mathsf{v}\mathbf{1})]^{[}\_{\![\!\!\!V\!} \mathsf{(}\mathcal{W}\text{)} \\ \wedge \forall x, v. \mathsf{l} \vdash \mathcal{W}. \mathsf{r}\mathsf{e1}(x)(v) \Rightarrow \ \mathsf{e}\_{\flat \in r. \mathsf{accq}(x)} \mathcal{W}. \mathsf{accq}(\iota)(v) \\ \wedge \forall m \in dom(u). \; dom(u(m)) \subseteq dom(\mathcal{W}. \mathsf{accq}) \\ \wedge \forall m \in M(\mathtt{r}\mathbf{1}\mathbf{x}). \\ (\langle\varnothing, \emptyset\rangle, \lambda x. 0, \varepsilon) \in [\mathcal{W}. \mathsf{r}\mathbf{e1}(m. \mathsf{l}\mathsf{c})(m. \mathsf{v}\mathbf{1})]^{[}\_{\![\!\!V\!} \mathsf{(}\mathcal{W}. \mathsf{r}\mathbf{1}\mathbf{\_}\!\!)] \end{array} \text{attacked\ resources} \text{ are} \ \mathsf{a} \ \mathsf{a}\ \forall x \in \mathcal{W}. \mathsf{a} \mathsf{l}(x. \mathsf{l}\mathbf{1}) \qquad \qquad \qquad \mathsf{a} \ \mathsf{l}\mathsf{c}\mathbf{e2} \text{ is} \ \mathsf{a}\ \mathsf{l}\mathsf{c}\mathbf{e1} \qquad \qquad \qquad \langle\varnothing, \mathsf{l}\mathsf{c}\mathbf{1}\mathbf{x}\rangle. \\ (\langle\varnothing, \emptyset\rangle, \lambda x. 0, \varepsilon) \in [\mathcal{W}. \mathsf{r}\mathbf{e1}(m. \mathsf{l}\$$

**Fig. 4.** Message resource satisfaction.

The last condition in the message resource satisfaction relation has to do with relaxed accesses. Since relaxed accesses do not provide synchronisation, we disallow ownership transfer through them. Therefore, we require that the release predicates connected with the relaxed messages are satisfiable with the empty resource. This condition, together with the requirement that the released resources satisfy acquire predicates, forbids ownership transfer via relaxed accesses.

The resource missing from the discussion so far is the write resource (modelling the W<sup>π</sup>(x, X) assertion). Intuitively, we would like to have the following property: whenever a thread adds a message to the memory, it has to own the corresponding write resource. Recall there are two ways a thread can produce a new message:


but will acquire the appropriate resource by the time it fulfils the promise. So, in order to assert that the promise step respects the resources owned by the thread, we also need to be able to talk about the resources that the thread *can acquire* in the future.

When dealing with the promises, the saving grace comes from the fact that all promises have to be certifiable, i.e., when issuing a promise a thread has to be able to fulfil it without help from other threads.

Intuitively, the existence of a certification run tells us that even though at the moment a thread issues a promise, it might not have the resources necessary to actually perform the corresponding write, the thread should, by running uninterrupted, still be able to obtain the needed resources before it fulfils the promise. This, in turn, tells us that the needed resources have to be already released by the other threads by the time the promise is made: only resources attached to messages in the memory are available to be acquired, and only the thread that made the promise is allowed to run during the certification; therefore all the available resources have already been released.

The above reasoning shows what it means for the promise steps to "respect resources": when promises are issued, the resources currently owned by a thread, together with all the resources it is able to acquire according to the resources it owns and the current assignment of resources to messages, have to contain the appropriate write resource for the write being promised. The notion of "resources a thread is able to acquire" is expressed through the canAcq(r, u) predicate. canAcq(r, u) performs a fixpoint calculation: the resources we have (r) allow us to acquire some more resources from the messages in memory (assignment of resources to messages is given by u), which allows us to acquire some more, and so on. Its formal definition can be found in the technical appendix, and hinges on the fact that u precisely tracks which resources satisfy which predicates.

**Fig. 5.** Resource composition.

An important element that was omitted from the discussion so far is the definition of the composition in the resource monoid *Res*. The resource composition, defined in Fig. 5, follows the expected notion of per-component composition. The most important feature is in the composition of write resources: a full permission write resource is only composable with the empty write resource.

At this point, we are equipped with all the necessary ingredients to relate abstract states represented by resources to concrete states M,P (where M is

$$\begin{array}{c} \left| r\_F, u, \mathcal{W} \right|\_T \stackrel{\text{def}}{=} \left\{ \langle M, P \rangle \mid \textbf{let } r = \prod\_{i \in T \land l} r\_F(i) \bullet \prod\_{m \in M} \prod\_{\iota \in dom(u(m))} u(m)(\iota) \text{ in } \ell \right. \\ (1) \quad M \mid = r, u, \mathcal{W} \land \end{array}$$

$$\begin{array}{ll}(3) & \forall m \in P. \, m. \, \mathsf{tid} \notin T \Rightarrow\\ & (r\_F(m.\,\mathsf{tid}) \,\bullet \, \mathsf{canAcq}(r\_F(m.\,\mathsf{tid}),u)). \mathtt{wr}(m.\mathtt{Loc}). \mathtt{perm} > 0\end{array}$$

#### **Fig. 6.** Erasure.

memory, and P is the set of promised messages). We define a function, called *erasure*, that given an assignment of resources to threads, r<sup>F</sup> : *ThreadId* → *Res*, an assignment of resources to messages, u, and a world, W, gives us a set of concrete states satisfying the following conditions:


Our formal notion of erasure, defined in Fig. 6, has an additional parameter, a set of thread identifiers T. This set allows us to exclude promises of threads T from the requirement of respecting the resources. As we will see in the following subsection, this additional parameter plays a subtle, but key, role in the soundness proof. (The notion of erasure described above corresponds to the case when T = ∅.)

Note also that the arguments of erasure very precisely account for who owns which part of the total resource. This diverges from the usual approach in separation logic, where we just give the total resource as the argument to the erasure. Our approach is motivated by Lemma 1, which states that a reader that owns the full write resource for location x knows which value it is going to read from x. This is the key lemma in the soundness proof of the (r-rlx\*) and (r-acq\*) rules.

**Lemma 1.** If (M,V,r<sup>F</sup> (i)) <sup>∈</sup> W<sup>1</sup>(x, X) η <sup>μ</sup>(W), and M,P∈r<sup>F</sup> , u, W{i} then for all messages <sup>m</sup> <sup>∈</sup> <sup>M</sup>(x) \ <sup>P</sup>(i) such that <sup>V</sup> (x) <sup>≤</sup> m.time, we have m.val <sup>=</sup> *fst*(max(X)).

Lemma 1 is looking from the perspective of thread i that owns the full write resource for the location <sup>x</sup>. This is expressed by (M,V,r<sup>F</sup> (i)) <sup>∈</sup> W<sup>1</sup>(x, X) η <sup>μ</sup>(W) (recall that r<sup>F</sup> (i) are the resources owned by the thread i). Furthermore, the lemma assumes that the concrete state respects the abstract resources, expressed by M,P∈r<sup>F</sup> , u, W{i}. Under these assumptions, the lemma intuitively tells us that the current thread knows which value it will read from x. Formally, the lemma says that all the messages thread i is allowed to read (i.e., messages in the memory that are not outstanding promises of thread i and whose timestamp is greater or equal to the view of thread i) have the value that appears as the maximal element in the set X.

To see why this lemma holds, consider a message m ∈ M(x) \ P(i). If m is an unfulfilled promise by a different thread j, then, by erasure, it follows that j currently owns or can acquire at least a shared write permission for x. However, this is a contradiction, since thread i currently owns the exclusive write permission, and, by erasure, r<sup>F</sup> (i) is disjoint from the resources of all other threads and all resources currently associated with messages by u. Hence, m must be a fulfilled write. By erasure, it follows that the set of fulfilled writes to x is given by the combination of all write permissions. Since r<sup>F</sup> (i) owns the exclusive write permission, this is just <sup>r</sup><sup>F</sup> (i).wr. Hence, the set of fulfilled writes is X, and the value of the last fulfilled write is *fst*(max(X)).

Note that in the reasoning above, it is crucial to know which thread and which message owns which resource. Without precisely tracking this information, we would be unable to prove Lemma 1.

### **4.4 Soundness**

Now that we have our notion of erasure, we can proceed to formalise the meaning of triples, and present the key points of the soundness proof.

Recall our intuitive view of Hoare triples saying that the program only makes steps which respect the resources it owns. This notion is formalised using the *safety* predicate: safety (somewhat simplified; we give its formal definition in Fig. 7) states that it is always safe to perform zero steps, and performing n + 1 steps is safe if the following two conditions hold:


$$\begin{split} \mathsf{safe}\_{0}(\sigma,B)(\mathcal{W}\_{1}) \stackrel{\text{def}}{=} \mathit{Mem} \times \mathit{View} \times \mathit{Res} \\ \mathsf{safe}\_{n+1}(\sigma,B)(\mathcal{W}\_{1}) \stackrel{\text{def}}{=} \{ \langle M\_{1},V\_{1},r\_{1} \rangle \mid \forall (M,V,r) \ge (M\_{1},V\_{1},r\_{1}), \forall \mathcal{W} \ge \mathcal{W}\_{1}. \\ (\sigma.s.s = \mathsf{skip} \Rightarrow \langle M,V,r \rangle \in vs(B(\sigma,\mu))(\mathcal{W})) \\ \wedge \quad \langle \forall P,r\_{F},\sigma',M',P',V',u,i.\ \langle M,P \rangle \in [r\_{F}[i \mapsto r],u,\mathcal{W}]\_{\emptyset} \wedge \\ \langle \langle \sigma,V \rangle,\langle M,P \rangle \rangle \implies \iota \langle \langle \sigma',V' \rangle,\langle M',P' \rangle \rangle \\ \Rightarrow \exists r',u',\mathcal{W}' \ge \mathcal{W}. \ (M',P') \in [r\_{F}[i \mapsto r'],u',\mathcal{W}']\_{\emptyset} \wedge \\ \quad \langle \langle M',P' \rangle,V',r' \rangle \in \mathsf{safe}\_{n}(\sigma',B)(\mathcal{W}') \} \end{split}$$

**Fig. 7.** Safety.


Note the following:


The semantics of Hoare triples is simply defined in terms of the safety predicate. The triple {P} s {Q} holds if every logical state satisfying the precondition is safe for any number of steps:

$$\left\| \vdash \{ P \} \; s \; \{ Q \} \right\| \stackrel{\text{def}}{=} \forall n, \mu, \eta, \mathcal{W}. \left\| P \right\| \_{\mu}^{\eta} (\mathcal{W}) \subseteq \text{safe}\_{n}((\mu, s), \lambda \mu'. \left\| Q \right\|\_{\mu'}^{\eta})(\mathcal{W})$$

To establish soundness of the SLR proof rules, we have to prove that the safety predicate holds for arbitrary number of steps, including promise steps. The trouble with reasoning about promise steps is that they can nondeterministically appear at any point of the execution. Therefore, we have to account for them in the soundness proof of every rule of our logic. To make this task manageable, we encapsulate reasoning about the promise steps in a theorem, thus enabling the proofs of soundness for proof rules to consider only the non-promise steps.

To do so, once again certification runs for promises play a pivotal role. Recall that whenever a thread makes a step, it has to be able to fulfil its promises without help from other threads (Sect. 3.4). Since there will be no interference by other threads, performing promise steps during certification is of no use (because promises can only be used by other threads). Therefore, we can assume that the certification runs are always promise-free.

Now that we have noted that certifications are promise-free, the key idea behind encapsulating the reasoning about promises is as follows. If we know that all executions of our program are safe for arbitrarily many non-promising steps, we can use this to conclude that they are safe for promising steps too. Here, we use the fact that certification runs are possible runs of the program, and the fact that certifications are promise-free.

Let us now formalise our key idea. First, we need a way to state that executions are safe for non-promising steps. This is expressed by the *non-promising* *safety* predicate defined in Fig. 8. What we want to conclude is that nonpromising safety is enough to establish safety, as expressed by Theorem 1:

### **Theorem 1 (Non-promising safety implies safety)**

∀n, σ, B, W. npsafe(n+1,0)(σ, B)(W) ⊆ safen(σ, B)(W)

We now discuss several important points in the definition of non-promising safety which enable us to prove this theorem.

*Non-promising Safety is Indexed by Pairs of Natural Numbers.* When proving Theorem 1, we use promise-free certification runs to establish the safety of the promise steps. A problem we face here is that the length of certification runs is unbounded. Somehow, we have to know that whenever the thread makes a step, it is npsafe for arbitrarily many steps. Our solution is to have npsafe transfinitely indexed over pairs of natural numbers ordered lexicographically. That way, if we are npsafe at index (n + 1, 0) and we take a step, we know that we are npsafe at index (n, m) for every m. We are then free to choose a sufficiently large m depending on the length of the certification run we are considering.

*Non-promising Safety Considers Configurations that May Contain Promises.* It is important to note that the definition of non-promising safety does not require that there are no promises in the starting configuration. The only thing that is required is that no more promises are going to be issued. This is very important for Theorem 1, since safety considers all possible starting configurations (including the ones with existing promises), and if we want the lemma to hold, non-promising safety has to consider all possible starting configurations too.

*Erasure Used in the Non-promising Safety does not Constrain Promises of the Current Thread.* Non-promising safety does not require promises by the thread being reduced (i.e., thread i) to respect resources. Thus, when reasoning about non-promising safety of thread i, we cannot assume that existing promises by thread i respect resources, but crucially we also do not have to worry about recertifying thread i's promises. However, since the NP −→ reduction does not recertify promises, we explicitly require that the promises are well formed (via wfprom predicate) in order to ensure that we still only consider executions where threads do not read from their own promises.

*Additional Constraints by the Non-promising Safety.* Non-promising safety also imposes additional constraints on the reducing thread i. In particular, any write permissions owned or acquirable by i after the reduction were already owned or acquirable by i before the reduction step. Intuitively, this holds because thread i can only transfer away resources and take ownership of resources it was already allowed to acquire before reducing. Lastly, non-promising safety requires that if the reduction of i performs any new writes or fulfils any old promises, it must own the write permission for the location of the given message. Together, these two conditions ensure that if a promise is fulfilled during a thread-local certification

**Fig. 8.** Non-promising safety.

and the thread satisfies non-promising safety, then the thread already owned or could acquire the write permission for the location of the promise. This is expressed formally in Lemma 2.

**Lemma 2.** Assuming that (M,P,V,r) ∈ npsafe(n+1,k)(σ, B)(W) and M,P∈r<sup>F</sup> [<sup>i</sup> → <sup>r</sup> • <sup>f</sup>], u, W{i} and σ, V ,M,P NP −→<sup>k</sup> <sup>i</sup> σ , V ,M , P and m ∈ (M \ P ) \ (<sup>M</sup> \ <sup>P</sup>), we have (<sup>r</sup> • canAcq(r, u)).wr(m.loc).perm <sup>&</sup>gt; 0.

The intuition for why Lemma 2 holds is that since only thread i executes, we know by the definition of non-promising safety that any write permission owned or acquirable by i when the promise is fulfilled, it already owns or can acquire in the initial state. Furthermore, whenever a promise is fulfilled, the nonpromising safety definition explicitly requires ownership of the corresponding write permission. It follows that the thread already owns or can acquire the write permission for the location of the given promise in the initial state.

Lemma 2 gives us exactly the property that we need to reestablish erasure after the operational semantics introduces a new promise. This makes Lemma 2 the key step in the proof of Theorem 1, which allows us to disentangle reasoning about promising steps and normal reduction steps. Theorem 1 tells us that, in order to prove a proof rule sound, it is enough to prove that the non-promising safety holds for arbitrary indices. This liberates us of the cumbersome reasoning about promise steps and allows us to focus on non-promising reduction steps when proving the proof rules sound.

We can now state our top-level correctness theorem, Theorem 2. Since our language only has top-level parallel composition, we need a way to distribute initial resources to the various threads, and to collect all the resources once all the threads have finished. The correctness theorem gives us precisely that:

**Theorem 2 (Correctness).** *If* A *is a finite set of locations and*

*1.*  ∀x ∈ A. φx(0) *2.* <sup>x</sup>∈<sup>A</sup>Rel(x, φx) <sup>∗</sup> Acq(x, φx) <sup>∗</sup> <sup>W</sup><sup>1</sup>(x, {(0, 0)}) <sup>i</sup>∈*Tid* P<sup>i</sup> *3.*  {Pi} s<sup>i</sup> {Qi} *for all* i *4.* λi.(μi, si), ⊥,M0, ∅ <sup>=</sup>⇒<sup>∗</sup> TS, *gconf and* TS(i).σ <sup>=</sup> **skip** *for all* <sup>i</sup> *5.* <sup>i</sup>∈*Tid* Q<sup>i</sup> - Q *6. FRV* (Qi) ∩ *FRV* (Q<sup>j</sup> ) = ∅ *for all distinct* i, j ∈ *Tid*

*then there exist* μ, r*, and* W *such that* (*gconf*.M,iTS(i).V, r) ∈ -Q [] <sup>μ</sup>(W) *and* ∀i ∈ *Tid*. ∀a ∈ *FRV*(Qi). μ(a) = TS(i).μ(a)*, where FRV*(P) *denotes the set of free register variables in* P*.*

### **5 Related Work**

There are a number of techniques for reasoning under relaxed memory models, but besides the DRF theorems and some simple invariant logics [10,13], no other techniques have been proved sound for a model allowing the weak behaviour of LB+data+fakedep from the introduction. The "invariant-based program logics" are by design unable to reason about programs like the random number generator, where having a bound on the set of values written to a location is not enough, let alone reasoning about functional correctness of a program.

*Relaxed Separation Logic (RSL).* Among program logics for relaxed memory, the most closely related is RSL [27]. There are two versions of RSL: a weak one that is sound with respect to the C/C++11 memory model, which features out-of-thin-air reads, and a stronger one that is sound with respect to a variant of the C/C++11 memory that forbids load buffering.

The weak version of RSL forbids relaxed writes completely, and does not constrain the value returned by a relaxed read. The stronger version provides singlelocation invariants for relaxed accesses, but its soundness proof relies strongly on a strengthened version of C/C++11 without *po* ∪*rf* cycles (where *po* is program order, and *rf* is the reads-from relation), which forbids load buffering.

When it comes to reasoning about coherence properties, even the strong version of RSL is surprisingly weak: it cannot be used to verify any of the coherence examples in this paper. In fact, RSL can be shown sound with respect to much weaker coherence axioms than what C/C++11 relaxed accesses provide.

One notable feature of RSL which we do not support is read-modify-write (RMW) instructions (such as compare-and-swap and fetch-and-add). However, the soundness proof of SLR makes no simplifying assumptions about the promising semantics which would affect the semantics of RMW instructions. Therefore, we are confident that enhancing SLR with rules for RMW instructions would not substantially affect the structure of the soundness proof, presented in Sect. 4.

*Other Program Logics.* FSL [8] extends (the strong version of) RSL with stronger rules for relaxed accesses in the presence of release/acquire fences. In FSL, a release fence can be used to package an assertion with a modality, which a relaxed write can then transfer. Conversely, the ownership obtained by a relaxed read is guarded by a symmetric modality than needs an acquire fence to be unpacked. The soundness proof of FSL also relies on *po*∪*rf* acyclicity. Moreover, it is known to be unsound in models where load buffering is allowed [9, Sect. 5.2].

A number of other logics—GPS [26], iGPS [12], OGRA [16], iCAP-TSO [24], the rely-guarantee proof system for TSO of Ridge [23], and the program logic for TSO of Wehrman and Berdine [28]—have been developed for even stronger memory models (release/acquire or TSO), and also rely quite strongly on—and try to expose—the stronger consistency guarantees provided by those models.

The framework of Alglave and Cousot [2] for reasoning about relaxed concurrent programs is parametric with respect to an axiomatic "per-execution" memory model. By construction, as argued by Batty et al. [3], such models cannot be used to define a language-level model allowing the weak behaviour of LB+data+fakedep and similar litmus tests while forbidding out-of-thin-air behaviours. Moreover, their framework does not provide the usual abstraction facilities of program logics.

The lace logic of Bornat et al. [6] targets hardware memory models, in particular Power. It relies on annotating the program with "per-execution" constraints, and on syntactic features of the program. For example, it distinguishes LB+data+fakedep from LB+data+po, its variant where the write of second thread is [x]rlx := 1, and is thus unsuitable to address out-of-thin-air behaviours.

*Other Approaches.* Besides program logics, another way to reason about programs under weak memory models is to reduce the act of reasoning under a memory model M to reasoning under a stronger model M —typically, but not necessarily, sequential consistency [7,18]. One can often establish DRF theorems stating that a program without any races when executed under M has the same behaviours when executed under M as when executed under M . For the promising semantics, Kang et al. [13, Sect. 5.4] have established such theorems for M being release-acquire consistency, sequential consistency, and the promise-free promising semantics, for suitable notions of races. The last one, the "Promise-Free DRF" theorem, is applicable to the Disjoint-Lists program from the introduction, but none of these theorems can be applied to any of the other examples of this paper, as they are racy. Moreover, these theorems are not compositional, as they do not state anything about the Disjoint-Lists program when put inside a larger, racy program—for example, just an extra read of a from another thread.

### **6 Conclusion**

In this paper, we have presented the first expressive logic that is sound under the promising semantics, and have demonstrated its expressiveness with a number of examples. Our logic can be seen both as a general proof technique for reasoning about concurrent programs, and also as tool for proving the absence of out-ofthin-air behaviour for challenging examples, and reasoning about coherence. In the future, we would like to extend the logic to cover more of relaxed memory, more advanced reasoning principles, such as those available in GPS [26], and mechanise its soundness proof.

Interesting aspects of relaxed memory we would like to also cover are read-modify-writes and fences. These would allow us to consider concurrent algorithms like circular buffers and the atomic reference counter verified in FSL++ [9]. This could be done by adapting the corresponding rules of RSL and GPS; moreover, we could adapt them with our new approach to reason about coherence.

To mechanise the soundness proof, we intend to use the Iris framework [11], which has already been used to prove the soundness of iGPS [12], a variant of the GPS program logic. To do this, however, we have to overcome one technical limitation of Iris. Namely, the current version of Iris is step-indexed over N, while our semantics uses transfinite step-indexing over <sup>N</sup> <sup>×</sup> <sup>N</sup> to define non-promising safety and allow us to reason about certifications of arbitrary length for each reduction step. Progress has been made towards transfinitely step-indexed logical relations that may be applicable to a transfinitely step-indexed version of Iris [25].

**Acknowledgments.** We would like to thank the reviewers for their feedback. The research was supported in part by the Danish Council for Independent Research (project DFF – 4181-00273), by a European Research Council Consolidator Grant for the project "RustBelt" (grant agreement no. 683289), and by Len Blavatnik and the Blavatnik Family foundation.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Logical Reasoning for Disjoint Permissions**

Xuan-Bach Le1(B) and Aquinas Hobor1,2

<sup>1</sup> National University of Singapore, Singapore, Singapore bachdylan@gmail.com <sup>2</sup> Yale-NUS College, Singapore, Singapore

**Abstract.** Resource sharing is a fundamental phenomenon in concurrent programming where several threads have permissions to access a common resource. Logics for verification need to capture the notion of permission ownership and transfer. One typical practice is the use of rational numbers in (0, 1] as permissions in which 1 is the full permission and the rest are fractional permissions. Rational permissions are not a good fit for separation logic because they remove the essential "disjointness" feature of the logic itself. We propose a general logic framework that supports permission reasoning in separation logic while preserving disjointness. Our framework is applicable to sophisticated verification tasks such as doing induction over the finiteness of the heap within the object logic or carrying out biabductive inference. We can also prove precision of recursive predicates within the object logic. We developed the ShareInfer tool to benchmark our techniques. We introduce "scaling separation algebras," a compositional extension of separation algebras, to model our logic, and use them to construct a concrete model.

### **1 Introduction**

The last 15 years have witnessed great strides in program verification [7,27,39, 43,44,46]. One major area of focus has been concurrent programs following Concurrent Separation Logic (CSL) [40]. The key rule of CSL is Parallel:

$$\begin{array}{cc} \{P\_1\} \ c\_1 \ \{Q\_1\} \ \end{array} \begin{array}{c} \{P\_2\} \ c\_2 \ \{Q\_2\} \end{array} \begin{array}{c} \{Q\_2\} \end{array} \begin{array}{c} c\_2 \ \{Q\_2\} \end{array} $$

In this rule, we write c1||c<sup>2</sup> to indicate the parallel execution of commands c<sup>1</sup> and c2. The separating conjunction indicates that the resources used by the threads is disjoint in some useful way, *i.e.* that there are no dangerous races. Many subsequent program logics [18,20,30,31,45] have introduced increasingly sophisticated notions of "resource disjointness" for the Parallel rule.

Fractional permissions (also called "shares") are a relatively simple enhancement to separation logic's original notion of disjointness [4]. Rather than owning a resource (e.g. a memory cell) entirely, a thread is permitted to own a part/fraction of that resource. The more of a resource a thread owns, the more actions it is permitted to take, a mapping called a *policy*. In this paper we will use the original policy of Bornat [4] to keep the examples straightforward: non-zero ownership of a memory cell permits reading while full ownership also permits writing. More modern logics allow for a variety of more flexible share policies [13,28,42], but our techniques still apply. Fractional permissions are less expressive than the "protocol-based" notions of disjointness used in program logics such as FCSL [38,44], Iris [30], and TaDa [16], but are well-suited for common concurrent programming patterns such as read sharing and so have been incorporated into many program logics and verification tools [19,26,28,31,36,41].

Since fractionals are simpler and more uniform than protocol-based logics, they are amenable to automation [26,33]. However, previous techniques had difficulty with the inductive predicates common in SL proofs. We introduce *predicate multiplication*, a concise method for specifying the fractional sharing of complex predicates, writing π · P to indicate that we own the π-share of the arbitrary predicate P, *e.g.* 0.5 · tree(x) indicates a tree rooted at x and we own half of each of the nodes in the tree. If set up properly, predicate multiplication handles inductive predicates smoothly and is well-suited for automation because:

Section 3 it distributes with bientailments—*e.g.* π ·(P ∧Q) (π ·P)∧(π ·Q)—

enabling rewriting techniques and both forwards and backwards reasoning; Section 4 it works smoothly with the inference process of biabduction [10]; and Section 5 the side conditions required for bientailments and biabduction can be verified directly in the object logic, leveraging existing entailment checkers.

There has been significant work in recent years on tool support for protocolbased approaches [15,19,29,30,48], but they require significant user input and provide essentially no inference. Fractional permissions and protocol-based approaches are thus complementary: fractionals can handle large amounts of relatively simple concurrent code with minimal user guidance, while protocol-based approaches are useful for reasoning about the implementations of fine-grained concurrent data structures whose correctness argument is more sophisticated.

In addition to Sects. 3, 4 and 5, the rest of this paper is organized as follows.

Section 2 We give the technical background necessary for our work.


**Fig. 1.** This heap satisfies tree(root, <sup>0</sup>.3) despite being a DAG

### **2 Technical Preliminaries**

*Share Models.* An (additive) share model (S, ⊕) is a partial commutative monoid with a bottom/empty element E and top/full element F. On the rationals in [0, 1], ⊕ is partial addition, E is 0, and F is 1. We also require the existence of complements <sup>π</sup> satisfying <sup>π</sup> <sup>⊕</sup> <sup>π</sup> <sup>=</sup> <sup>F</sup>; in <sup>Q</sup>, <sup>π</sup> def = 1 <sup>−</sup> <sup>π</sup>.

*Separation Logic.* Our base separation logic has the following connectives:

$$\{P, Q, \text{ etc. } \stackrel{\text{def}}{=} \mid \langle F \rangle \mid P \land Q \mid P \lor Q \mid \neg P \mid P \star Q \mid \forall x. P \mid \exists x. P \mid \mu X. P \mid e\_1 \stackrel{\pi}{\leftarrow} e\_2$$

Pure facts F are put in angle brackets, *e.g. even*(12). Pure facts force the empty heap, *i.e.* the usual separation logic emp predicate is just a macro for . Our propositional fragment has (classical) conjunction ∧, disjunction ∨, negation ¬, and the separating conjunction -. We have both universal ∀ and existential ∃ quantifiers, which can be impredicative if desired. To construct recursive predicates we have the usual Tarski least fixpoint μ. The fractional points-to e<sup>1</sup> π → e<sup>2</sup> means we own the π-fraction of the memory cell pointed to by e1, whose contents is e2, and nothing more. To distinguish points-to from emp we require that π be non-E. For notational convenience we sometimes elide the full share F over a fractional maps-to, writing just e<sup>1</sup> → e2. The connection of ⊕ to the fractional maps-to predicate is given by the bi-entailment:

<sup>e</sup> <sup>π</sup><sup>1</sup> <sup>→</sup> <sup>e</sup><sup>1</sup> <sup>e</sup> <sup>π</sup><sup>2</sup> <sup>→</sup> <sup>e</sup><sup>2</sup> <sup>e</sup> <sup>π</sup>1⊕π<sup>2</sup> −→ <sup>e</sup><sup>1</sup> <sup>∧</sup> <sup>e</sup><sup>1</sup> <sup>=</sup> <sup>e</sup><sup>2</sup> MapsTo Split

*Disjointness.* Although intuitive, the rationals are not a good model for shares in SL. Consider this definition for π-fractional trees rooted at x:

$$\mathsf{tree}(x,\pi) \stackrel{\text{def}}{=} \langle x = \mathsf{null} \rangle \vee \exists d, l, r. \, x \stackrel{\pi}{\mapsto} \langle d, l, r \rangle \star \mathsf{tree}(l, \pi) \star \mathsf{tree}(r, \pi) \tag{1}$$

This tree predicate is obtained directly from the standard recursive predicate for binary trees by asserting only π ownership of the root and recursively doing the same for the left and right substructures, and so at first glance looks straightforward<sup>1</sup>. The problem is that when <sup>π</sup> <sup>∈</sup> (0, <sup>0</sup>.5], then tree can describe some

<sup>1</sup> We write x <sup>π</sup> → (v1,...,vn) for <sup>x</sup> <sup>π</sup> → <sup>v</sup><sup>1</sup> - (x + 1) <sup>π</sup> → <sup>v</sup><sup>2</sup> -...- (<sup>x</sup> <sup>+</sup> <sup>n</sup> <sup>−</sup> 1) <sup>π</sup> → <sup>v</sup>n.

non-tree directed acyclic graphs as in Fig. 1. Fractional trees are a little too easy to introduce and thus unexpectedly painful to eliminate.

To prevent the deformation of recursive structures shown in Fig. 1, we want to recover the "disjointness" property of basic SL: e → e<sup>1</sup> e → e<sup>2</sup> ⊥. Disjointness can be specified either as an inference rule in separation logic [41] or as an algebraic rule on the share model [21] as follows:

$$\begin{array}{ccccc}\hline\hline e \stackrel{\pi}{\to} e\_1 \star e \stackrel{\pi}{\to} e\_2 & \dashv & \perp & & \forall a, b \; . \; a \oplus a = b & \Rightarrow & a = \mathcal{E} \\\hline e \stackrel{\pi}{\to} e\_1 \star e \stackrel{\pi}{\to} e\_2 & \dashv & \exists a \; \; \; \; \; \; \; \mathcal{E} & \Rightarrow & a = \mathcal{E} \\\hline \hline \end{array}$$

In other words, **a nonempty share** π **cannot join with itself**. In Sect. 3 we will see how disjointness enables the distribution of predicate multiplication over and in Sect. 4 we will see how disjointness enables antiframe inference during biabduction.

*Tree Shares.* Dockins *et al.* [21] proposed "tree shares" as a share model satisfying disjointness. For this paper the details of the model are not critical so we provide only a brief overview. A tree share τ ∈ T is a binary tree with Boolean leaves, *i.e.* τ = •| ◦ | τ<sup>1</sup> τ2, where ◦ is the empty share E and • is the full share <sup>F</sup>. There are two "half" shares: ◦ • and • ◦, and four "quarter" shares, *e.g.* • ◦ ◦. Trees must be in *canonical form*, *i.e.*, the most compact representation under ∼=:

$$
\begin{array}{ccc}
\begin{array}{ccc}
\begin{array}{ccc}
\begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} \\
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} \\
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} & \begin{array}{c}
\end{array} \\
\end{array} \\
\end{array} \\
\end{array}
$$

Union , intersection , and complement¯· are the basic operations on tree shares; they operate leafwise after unfolding the operands under ∼= into the same shape:

$$\begin{array}{rcl} \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} & = & \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \\ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \end{array} \cong \begin{array}{rcl} \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \\ \stackrel{\bullet}{\bullet}\_{\bullet} \circ \stackrel{\bullet}{\bullet}\_{\bullet} \end{array}$$

The structure T,,,¯·, ◦, • forms a countable atomless Boolean algebra and thus enjoys decidable existential and first-order theories with precisely known complexity bounds [34]. The join operator ⊕ on trees is defined as τ1⊕τ<sup>2</sup> = τ<sup>3</sup> def = τ<sup>1</sup> τ<sup>2</sup> = τ<sup>3</sup> ∧ τ<sup>1</sup> τ<sup>2</sup> = ◦. Due to their good metatheoretic and computational properties, a variety of program logics [24,25] and verification tools [3,26,33,47] have used tree shares (or other isomorphic structures [19]).

### **3 Predicate Multiplication**

The additive structure of share models is relatively well-understood [21,33,34]. The focus for this paper is exploring the benefits and consequences of incorporating a multiplicative operator ⊗ into a share model. The simplest motivation for multiplication is computationally dividing some share π of a resource "in half;"

```
1 struct tree { int d; struct tree* l; struct tree* r;};
2 void processTree( struct tree* x) {
3 if (x == 0) { return ; }
4 print(x -> d);
5 processTree(x -> l);
6 processTree(x -> r);
                              7 print(x -> d);
                              8 processTree(x -> l);
                              9 processTree(x -> r);
10 }
```
**Fig. 2.** The parallel processTree function, written in a C-like language

the two halves of the resource are then given to separate threads for parallel processing. When shares themselves are rationals, ⊗ is just ordinary multiplication, *e.g.* we can divide 0.6 = (0.5⊗0.6)⊕(0.5⊗0.6). Defining a notion of multiplication on a share model that satisfies disjointness is somewhat trickier, but we can do so with tree shares T as follows. Define τ<sup>1</sup> ⊗ τ<sup>2</sup> to be the operation that replaces each • in <sup>τ</sup><sup>2</sup> with a copy of <sup>τ</sup>1, *e.g.*: ◦ • <sup>⊗</sup> • ◦ • ◦ <sup>=</sup> ◦ • ◦ ◦ • ◦ . The structure

(T, ⊕, ⊗) is a kind of "near-semiring." The ⊗ operator is associative, has identity F and null point E, and is right distributive, *i.e.* (a⊕b)⊗c = (a⊗c)⊕(b⊗c). It is not commutative, does not distribute on the left, or have inverses. It is hard to do better: adding axioms like multiplicative inverses forces any model satisfying disjointness (∀a, b. a ⊕ a = b ⇒ a = E) to have no more than two elements (Sect. 8).

Now consider the toy program in Fig. 2. Starting from the tree rooted at x, the program itself is dead simple. First (line 3) we check if the x is null, *i.e.* if we have reached a leaf; if so, we return. If not, we split into parallel threads (lines 4–6 and 7–9) that do some processing on the root data in both branches. In the toy example, the processing just prints out the root data (lines 4 and 7); the print command is unimportant: what is important that we somehow access some of the data in the tree. After processing the root, both parallel branches call the processTree function recursively on the left x->l (lines 5 and 8) and right x->r (lines 6 and 9) branches, respectively. After both parallel processes have terminated, the function returns (line 10). The program is simple, so we would like its verification to be equally simple.

Predicate multiplication is the tool that leads to a simple proof. Specifically, we would like to verify that processTree has the specification:

$$\forall \pi, x. \; \left( \; \{ \pi \cdot \mathsf{tree}(x) \} \; \middle| \; \mathsf{processTree}(x) \; \{ \pi \cdot \mathsf{tree}(x) \} \right)$$

Here tree(x) def = x = null∨∃d, l, r. x → (d, l, r) tree(l) tree(r) is exactly the usual definition of binary trees in separation logic. Predicate multiplication has allowed us to isolate the fractional ownership from the definition; compare with Eq. (1) above. Our precondition and postcondition both say that x is a pointer to a heap-represented π-owned tree. Critically, we want to ensure that our π-share at the end of the program is equal to the π-share at the beginning. This way if our initial caller had full <sup>F</sup> ownership before calling processTree, he will have full ownership afterwards (allowing him to *e.g.* deallocate the tree).

The intuition behind the proof is simple. First in line 3, we check if x is null; if so we are in the base case of the tree definition and can simply return. If not we can eliminate the left disjunct and can proceed to split the --separated bits into disjoint subtrees l and r, and then dividing the ownership of those bits into two "halves". Let <sup>L</sup> def <sup>=</sup> • ◦ and <sup>R</sup> def <sup>=</sup> <sup>L</sup> <sup>=</sup> ◦ •. When we start the parallel computation on lines 4 and 7 we want to pass the left branch of the computation the L⊗π-share of the spatial resources, and the right branch of the computation the R ⊗ π. In both branches we then need to show that we can read from the data cell, which in the simple policy we use for this paper boils down to making sure that the product of two non-E shares cannot be E. This is a basic property for reasonable share models with multiplication. In the remainder of the parallel code (lines 5–6 and 8–9) we need to make recursive calls, which is done by simply instantiating <sup>π</sup> with L ⊗ <sup>π</sup> and R ⊗ <sup>π</sup> in the recursive specification (as well as l and r for <sup>x</sup>). The later half proof after the parallel call is pleasantly symmetric to the first half in which we fold back the original tree predicate by merging the two halves L ⊗ π and R ⊗ π back into π. Consequently, we arrive at the postcondition π · tree(x), which is identical to the precondition.

#### **3.1 Proof Rules for Predicate Multiplication**

In Fig. <sup>4</sup> we put the formal verification for processTree, which follows the informal argument very closely. However, before we go through it, let us consider the reason for this alignment: because the key rules for reasoning about predicate multiplication are bidirectional. These rules are given in Fig. 3. The non-spatial rules are all straightforward and follow the basic pattern that predicate multiplication both pushes into and pulls out of the operators of our logic without meaningful side conditions. The DotPure rule means that predicate multiplication ignores pure facts, too. Complicating the picture slightly, predicate multiplication pushes into implication ⇒ but does not pull out of it. Combining DotImpl with DotPure we get a one-way rule for negation: <sup>π</sup> · (¬P) ¬π·. We will explain why we cannot get both directions in Sects. 5.1 and 8.

Most of the spatial rules are also simple. Recall that emp def <sup>=</sup> , so Dot-Pure yields <sup>π</sup>·emp emp. The DotFull rule says that <sup>F</sup> is the scalar identity on predicates, just as it is the multiplicative identity on the share model itself. The DotDot rule allows us to "collapse" repeated predicate multiplication using share multiplication; we will shortly see how we use it to verify the recursive calls to processTree. Similarly, the DotMapsTo rule shows how predicate multiplication combines with basic maps-to by multiplying the associated shares together. All three rules are bidirectional and require no side conditions.

While the last two rules are both bidirectional, they both have side conditions. The DotPlus rule shows how predicate multiplication distributes over <sup>⊕</sup>. The direction does not require a side condition, but the direction we require that P be *precise* in the usual separation logic sense. Precision will be discussed

<sup>P</sup> <sup>Q</sup> <sup>π</sup> · <sup>P</sup> <sup>π</sup> · <sup>Q</sup> Dot Pos <sup>π</sup> · P P Dot Pure <sup>π</sup> · (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>) (<sup>π</sup> · <sup>P</sup>) <sup>⇒</sup> (<sup>π</sup> · <sup>Q</sup>) Dot Impl <sup>π</sup> · (<sup>P</sup> <sup>∧</sup> <sup>Q</sup>) (<sup>π</sup> · <sup>P</sup>) <sup>∧</sup> (<sup>π</sup> · <sup>Q</sup>) Dot Conj <sup>π</sup> · (<sup>P</sup> <sup>∨</sup> <sup>Q</sup>) (<sup>π</sup> · <sup>P</sup>) <sup>∨</sup> (<sup>π</sup> · <sup>Q</sup>) Dot Disj <sup>π</sup> · (¬P) ¬<sup>π</sup> · <sup>P</sup> Dot Neg <sup>τ</sup> <sup>=</sup> <sup>∅</sup> <sup>π</sup> · <sup>∀</sup><sup>x</sup> : τ. P(x) - ∀<sup>x</sup> : τ. π · <sup>P</sup>(x) Dot Univ <sup>π</sup> · <sup>∃</sup><sup>x</sup> : τ. P(x) - ∃<sup>x</sup> : τ. π · <sup>P</sup>(x) Dot Exis F · <sup>P</sup> <sup>P</sup> Dot Full <sup>π</sup><sup>1</sup> · (π<sup>2</sup> · <sup>P</sup>) (π<sup>1</sup> <sup>⊗</sup> <sup>π</sup>2) · <sup>P</sup> Dot Dot <sup>π</sup> · <sup>x</sup> -<sup>→</sup> <sup>y</sup> <sup>x</sup> <sup>π</sup> -<sup>→</sup> <sup>y</sup> Dot MapsTo precise(P) (π<sup>1</sup> <sup>⊕</sup> <sup>π</sup>2) · <sup>P</sup> (π<sup>1</sup> · <sup>P</sup>) - (π<sup>2</sup> · <sup>P</sup>) Dot Plus <sup>P</sup> uniform(π- ) <sup>Q</sup> uniform(π- ) <sup>π</sup> · (P -<sup>Q</sup>) (<sup>π</sup> · <sup>P</sup>) - (<sup>π</sup> · <sup>Q</sup>) Dot Star

**Fig. 3.** Distributivity of the scaling operator over pure and spatial connectives

in Sect. 5.2; for now a simple counterexample shows why it is necessary: L · (<sup>x</sup> → <sup>a</sup> <sup>∨</sup> (<sup>x</sup> + 1) → <sup>b</sup>) - R · (<sup>x</sup> → <sup>a</sup> <sup>∨</sup> (<sup>x</sup> + 1) → <sup>b</sup>) F· (<sup>x</sup> → <sup>a</sup> <sup>∨</sup> (<sup>x</sup> + 1) → <sup>b</sup>) The premise is also consistent with x L → a - (x + 1) <sup>R</sup> → b.

The DotStar rule shows how predicate multiplication distributes into and out of the separating conjunction -. It is also bidirectional. **Crucially, the direction fails on non-disjoint share models like** Q, which is the "deeper reason" for the deformation of recursive structures illustrated in Fig. 1. On disjoint share models like T, we get equational reasoning subject to the side condition of *uniformity*. Informally, P uniform(π ) asserts that any heap that satisfies P has the permission π uniformly at each of its defined addresses. In Sect. 8 we explain why we cannot admit this rule without a side condition.

In the meantime, let us argue that most predicates used in practice in separation logic are uniform. First, every SL predicate defined in non-fractional settings, such as tree(x), is F-uniform. Second, P is a π-uniform predicate if and only if π · P is (π ⊗ π)-uniform. Third, the --conjunction of two π-uniform predicates is also π-uniform. Since a significant motivation for predicate multiplication is to allow standard SL predicates to be used in fractional settings, these already cover many common cases in practice. It is useful to consider examples of non-uniform predicates for contrast. Here are three (we elide the base cases):

$$\begin{array}{ll} \mathsf{dist}(x) & \vdash \exists d, n. \left( (\langle d = 17 \rangle \star x \xleftarrow{\mathcal{L}} (d, n)) \vee (\langle d \neq 17 \rangle \star x \xleftarrow{\mathcal{R}} (d, n)) \right) \star \mathsf{dist}(n) \\\ \mathsf{dist}(x) & \vdash \exists d, n. x \mapsto d, n \star \mathcal{L} \cdot \mathsf{dist}(n) \\\ \mathsf{dtree}(x) & \vdash \exists d, l, r. x \mapsto d, l, r \star \mathcal{L} \cdot \mathsf{dtree}(l) \star \mathcal{R} \cdot \mathsf{dtree}(r) \end{array}$$

The slist(x) predicate owns different amounts of permissions at different memory cells depending on the value of those cells. The dlist(x) predicate owns decreasing amounts of the list, *e.g.* the first cell is owned more than the second, which is owned more than the third. The dtree(x) predicate is even stranger, owning different amounts of different branches of the tree, essentially depending on the

<sup>1</sup> void processTree( struct tree\* x) { // { <sup>π</sup> · tree(x) } <sup>2</sup> // π · <sup>x</sup> <sup>=</sup> null <sup>∨</sup> <sup>∃</sup>d, l, r. <sup>x</sup> -<sup>→</sup> (d, l, r) tree(l) tree(r) - <sup>3</sup> // x = null ∨ <sup>∃</sup>d, l, r. <sup>x</sup> <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - <sup>4</sup> if (x == null) { // {x = null} <sup>5</sup> return ;} // { <sup>π</sup> · tree(x) } <sup>6</sup> // x <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - <sup>7</sup> // F · x <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - <sup>8</sup> // (L⊕R) · x <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - <sup>9</sup> // ⎧ ⎨ ⎩ L · x <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - - R· x <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - ⎫ ⎬ ⎭ <sup>10</sup> // L · x <sup>π</sup> -<sup>→</sup> (d, l, r) <sup>π</sup> · tree(l) - <sup>π</sup> · tree(r) - <sup>11</sup> // L · <sup>x</sup> <sup>π</sup> -<sup>→</sup> (d, l, r) - L · <sup>π</sup> · tree(l) - L · <sup>π</sup> · tree(r) <sup>12</sup> // <sup>x</sup> L⊗<sup>π</sup> −- <sup>→</sup> (d, l, r) - (L ⊗ <sup>π</sup>) · tree(l) - - (L ⊗ <sup>π</sup>) · tree(r) - <sup>13</sup> print(x -> d); <sup>14</sup> processTree(x -> l); processTree(x -> r); <sup>15</sup> // <sup>x</sup> L⊗<sup>π</sup> −- <sup>→</sup> (d, l, r) - (L ⊗ <sup>π</sup>) · tree(l) - - (L ⊗ <sup>π</sup>) · tree(r) - <sup>16</sup> // L · <sup>π</sup> · <sup>x</sup> -<sup>→</sup> (d, l, r) - L · <sup>π</sup> · tree(l) - L · <sup>π</sup> · tree(r) <sup>17</sup> // L · <sup>π</sup> · <sup>x</sup> -<sup>→</sup> (d, l, r) tree(l) tree(r) - ... <sup>18</sup> // ⎧ ⎨ ⎩ L ·<sup>π</sup> · <sup>x</sup> -<sup>→</sup> (d, l, r) tree(l) tree(r) - - R·<sup>π</sup> · <sup>x</sup> -<sup>→</sup> (d, l, r) tree(l) tree(r) - ⎫ ⎬ ⎭ <sup>19</sup> // (L⊕R) · <sup>π</sup> · <sup>x</sup> -<sup>→</sup> (d, l, r) tree(l) tree(r) - <sup>20</sup> } // { <sup>π</sup> · tree(x) }

**Fig. 4.** Reasoning with the scaling operator <sup>π</sup> · <sup>P</sup>.

path to the root. None of these predicates mix well with DotStar, but perhaps they are not useful to verify many programs in practice, either. In Sects. 5.1 and 5.2 we will discuss how to prove predicates are precise and uniform. In Sect. 5.4 will demonstrate our techniques to do so by applying them to two examples.

## **3.2 Verification of** processTree **using predicate multiplication**

We now explain how the proof of processTree is carried out in Fig. <sup>4</sup> using scaling rules in Fig. 3. In line 2, we unfold the definition of predicate tree(x) which consists of one base case and one inductive case. We reach line 3 by pushing π inward using various rules DotPure, DotDisj, DotExis, DotMapsto and DotStar. To use DotStar we must prove that tree(x) is <sup>F</sup>-uniform, which we show how to do in Sect. 5.4. We prove this lemma once and use it many times.

The base base x <sup>=</sup> null is handled in lines 4–5 by applying rule DotPure, *i.e.*, x <sup>=</sup> null <sup>π</sup> · x <sup>=</sup> null and then DotPos, <sup>π</sup> · x <sup>=</sup> null <sup>π</sup> · tree(x). For the inductive case, we first apply DotFull in line 7 and then replace <sup>F</sup> with L⊕R (recall that <sup>R</sup> is <sup>L</sup>'s compliment). On line 9 we use DotPlus to translate the split on shares with ⊕ into a split on heaps with -.

We show only one parallel process; the other is a mirror image. Line 10 gives the precondition from the Parallel rule, and then in lines 11 and 12 we continue to "push in" the predicate multiplication. To verify the code in lines 13– 14 just requires Frame. Notice that we need the DotDot rule to "collapse" the two uses of predicate multiplication into one so that we can apply the recursive specification (with the new π in the recursive precondition equal to L ⊗ π).

Having taken the predicate completely apart, it is now necessary to put Humpty Dumpty back together again. Here is why it is vital that all of our proof rules are bidirectional, without which we would not be able to reach the final postcondition π ·tree(x). The final wrinkle is that for line 19 we must prove the precision of the tree(x) predicate. We show how to do so with example in Sect. 5.4, but typically in a verification this is proved once per predicate as a lemma.

### **4 Bi-abductive Inference with Fractional Permissions**

Biabduction is a separation logic inference process that helps to increase the scalability of verification for sizable programs [22,49]; in recent years it has been the focus of substantial research for (sequential) separation logic [8,10,11,32]. Biabduction aims to infer the missing information in an incomplete separation logic entailment. More precisely, given an incomplete entailment A-[??] B-[??], we would like to find predicates for the two missing pieces [??] that complete the entailment in a nontrivial manner. The first piece is called the *antiframe* while the second is the *inference frame*. The standard approach consists of two sequential subroutines, namely the *abductive inference* and *frame inference* to construct the antiframe and frame respectively. Our task in this section is to show how to upgrade these routines to handle fractional permissions so that biabduction can extend to concurrent programs. As we will see, disjointness plays a crucial role in antiframe inference.

### **4.1 Fractional Residue Computation**

Consider the fractional point-to bi-abduction problem with rationals:

$$a \overset{\pi\_1}{\longmapsto} b \star [??] \vdash a \overset{\pi\_2}{\longmapsto} b \star [??]$$

There are three cases to consider, namely π<sup>1</sup> = π2, π<sup>1</sup> < π<sup>2</sup> or π<sup>1</sup> > π2. In the first case, both the (minimal) antiframe F<sup>a</sup> and frame F<sup>f</sup> are emp; for the second case we have <sup>F</sup><sup>a</sup> <sup>=</sup> emp, <sup>F</sup><sup>f</sup> <sup>=</sup> <sup>a</sup> <sup>π</sup>2−π<sup>1</sup> −−−−→ <sup>b</sup> and the last case gives us <sup>F</sup><sup>a</sup> <sup>=</sup> <sup>a</sup> <sup>π</sup>1−π<sup>2</sup> −−−−→ b, F<sup>f</sup> <sup>=</sup> emp. Here we straightforwardly compute the residue permission using rational subtraction. In general, one can attempt to define subtraction from a share model S, ⊕ as a b = c def = b ⊕ c = a. However, this definition is too coarse as we want subtraction to be a total function so that the residue is always computable efficiently. A solution to this issue is to relax the requirements for , asking only that it satisfies the following two properties:

$$C\_1: a \oplus (b \ominus a) = b \oplus (a \ominus b) \qquad \quad C\_2: a \ll b \oplus c \Rightarrow a \ominus b \ll c$$

where <sup>a</sup> <sup>b</sup> def = ∃c. a ⊕ c = b. The condition C<sup>1</sup> provides a convenient way to compute the fractional residue in both the frame and antiframe while C<sup>2</sup> asserts that a b is effectively the minimal element that when joined with b becomes greater than <sup>a</sup>. In the rationals <sup>Q</sup>, <sup>a</sup> <sup>b</sup> def = *if* (a>b) *then* a − b *else* 0. On tree shares <sup>T</sup>, <sup>a</sup> <sup>b</sup> def = a b. Recalling that the case when π<sup>1</sup> = π<sup>2</sup> is simple (both the antiframe and frame are just emp), then if π<sup>1</sup> = π<sup>2</sup> we can compute the fractional antiframe and inference frames uniquely using :

<sup>a</sup> <sup>π</sup><sup>1</sup> −→ b<sup>a</sup> <sup>π</sup>2π<sup>1</sup> −−−−→ <sup>b</sup> <sup>a</sup> <sup>π</sup><sup>2</sup> −→ b<sup>a</sup> <sup>π</sup>1π<sup>2</sup> −−−−→ <sup>b</sup> Msub

Generally, the following rule helps compute the residue of predicate P:

$$\frac{\text{preise}(P)}{\pi\_1 \cdot P \star (\pi\_2 \ominus \pi\_1) \cdot P \vdash \pi\_2 \cdot P \star (\pi\_1 \ominus \pi\_2) \cdot P} \text{ PSUB}$$

Using C<sup>1</sup> and C<sup>2</sup> it is easy to prove that the residue is minimal w.r.t. , *i.e.*:

$$
\pi\_1 \oplus a = \pi\_2 \oplus b \Rightarrow \pi\_2 \ominus \pi\_1 \ll a \land \pi\_1 \ominus \pi\_2 \ll b
$$

#### **4.2 Extension of Predicate Axioms**

To support reasoning over recursive data structure such as lists or trees, the assertion language is enriched with the corresponding inductive predicates. To derive properties over inductive predicates, verification tools often contain a list of predicate axioms/facts and use them to aid the verification process [9,32]. These facts are represented as entailment rules A B that can be classified into "folding" and "unfolding" rules to manipulate the representation of inductive predicates. For example, some axioms for the tree predicate are:

$$\begin{array}{ll} F\_1: x = 0 \land \mathsf{emp} \vdash \mathsf{tree}(x) & F\_2: x \mapsto (v, x\_1, x\_2) \star \mathsf{tree}(x\_1) \star \mathsf{tree}(x\_2) \vdash \mathsf{tree}(x) \\\ U: \mathsf{tree}(x) \land x \neq 0 \vdash \exists v, x\_1, x\_2. \; x \mapsto (v, x\_1, x\_2) \star \mathsf{tree}(x\_1) \star \mathsf{tree}(x\_2) \end{array}$$

We want to transform these axioms into fractional forms. The key ingredient is the DotPos rule from Fig. 3, that lifts the fractional portion of an entailment, *i.e.* (P Q) ⇒ (π ·P π ·Q). Using this and the other scaling rules from Fig. 3, we can upgrade the folding/unfolding rules into corresponding fractional forms:

$$\begin{array}{c} F\_1': x = 0 \land \mathsf{emp} \vdash \pi \cdot \mathsf{tree}(x) \quad F\_2': x \stackrel{\pi}{\longmapsto} (v, x\_1, x\_2) \star \pi \cdot \mathsf{tree}(x\_1) \star \pi \cdot \mathsf{tree}(x\_2) \vdash \pi \cdot \mathsf{tree}(x) \\\ U: \mathsf{tree}(x) \land x \neq 0 \vdash \exists v, x\_1, x\_2. \; x \longmapsto (v, x\_1, x\_2) \star \mathsf{tree}(x\_1) \star \mathsf{tree}(x\_2) \end{array}$$

As our scaling rules are bi-directional, they can be applied both in the antecedent and consequent to produce a smooth transformation to fractional axioms. Also, recall that our DotStar rule <sup>π</sup> · (P -Q) π · P π · Q has a side condition that both P and Q are π -uniform. This condition is trivial in the transformation as standard predicates (*i.e.* those without permissions) are automatically F-uniform. Furthermore, the precision and uniformity properties can be transferred directly to fractional forms by the following rules:

precise(π · P) ⇔ precise(P) P uniform(π) ⇔ π · P uniform(π ⊗ π)

#### **4.3 Abductive Inference and Frame Inference**

To construct the antiframe, Calcagno *et al.* [10] presented a general framework for antiframe inference which contains rules of the form:

$$\frac{\Delta' \star [M'] \rhd H' \qquad \text{Cond}}{\Delta \star [M] \rhd H}$$

where Cond is the side condition, together with consequents (H, H ), heap formulas (Δ, Δ ) and antiframes (M,M ). In principle, the abduction algorithm gradually matches fragments of consequent with antecedent, derives sound equalities among variables while applying various folding and unfolding rules for recursive predicates in both sides of the entailment. Ideally, the remaining unmatched fragments of the antecedent are returned to form the antiframe. During the process, certain conditions need to be maintained, *e.g.*, satisfiability of the antecedent or minimal choice for antiframe. After finding the antiframe, the inference process is invoked to construct the inference frame. In principle, the old antecedent is first combined with the antiframe to form a new antecedent whose fragments are matched with the consequent. Eventually, the remaining unmatched fragments of the antecedent are returned to construct the inference frame.

The discussion of fractional residue computation in Sect. 4.1 and extension of recursive predicate rules in Sect. 4.2 ensure a smooth upgrade of the biabduction algorithm to fractional form. We demonstrate this intuition using the example in Fig. 5. The partial consequent is a fractional tree(x) predicate with permission π<sup>3</sup> while the partial antecedent is star conjunction of a fractional maps-to predicate of address x with permission π1, a fractional tree(x1) predicate with permission π<sup>2</sup> and a null pointer x2. Following the spirit of Calcagno *et al.* [10], the steps in both sub-routines include applying the folding and unfolding rules for predicate tree and then matching the corresponding pair of fragments from antecedent and consequent. On the other hand, the upgraded part is reflected through the use of the two new rules Msub and Psub to compute the fractional residues as well as a more general system of folding and unfolding rules for predicate tree. We are then able to compute the antiframe a = x1∧(π3π2)·tree(x1)<sup>x</sup> <sup>π</sup>3π<sup>2</sup> −−−−→ (v, a, x2) and the inference frame <sup>x</sup> <sup>π</sup>1π<sup>3</sup> −−−−→ (v, x1, x2) -(π<sup>2</sup> π3) · tree(x1) respectively.

Abductive inference


Frame inference

*Antiframe Inference and Disjointness.* Consider the following abduction problem:

$$x \mapsto (v, x\_1, x\_2) \star \mathbf{tree}(x\_1) \star [??] \vdash \mathbf{tree}(x)$$

Using the folding rule F2, we can identify the antiframe as tree(x2). Now suppose we have a rational permission π ∈ Q distributed everywhere, *i.e.*:

$$x \stackrel{\pi}{\longmapsto} (v, x\_1, x\_2) \star \pi \cdot \mathsf{tree}(x\_1) \star [??] \vdash \pi \cdot \mathsf{tree}(x)$$

A na¨ıve solution is to let the antiframe be π ·tree(x2). However, in Q this choice is unsound due to the deformation of recursive structures issue illustrated in Fig. 1: if the antiframe is π · tree(x2), the left hand side can be a DAG, even though the right hand side must be a tree. However, in disjoint share models like T, choosing π ·tree(x2) for the antiframe is correct and the entailment holds. As is often the case, things are straightforward once the definitions are correct.

### **5 A Proof Theory for Fractional Permissions**

Our main objective in this section is to show how to discharge the uniformity and precision side conditions required by the DotStar and DotPlus rules. To handle recursive predicates like tree(x) we develop set of novel modal-logic based proof rules to carry out induction in the heap. To allow tools to leverage existing entailment checkers, all of these techniques are done **in the object logic itself**, rather than in the metalogic. Thus, in Sect. 5, we do not assume a concrete model for our object logic (in Sect. 7 we will develop a model).

First we discuss new proof rules for predicate multiplication and fractional maps-to (Sect. 5.1), precision (Sect. 5.2), and induction over fractional heaps (Sect. 5.3). We then conclude (Sect. 5.4) with two examples of proving real properties using our proof theory: that tree(x) is F-uniform and that list(x) is precise. Some of the theorems have delicate proofs, so all of them have been verified in Coq [1].

### **5.1 Proof Theory for Predicate Multiplication and Fractional Maps-To**

In Sect. 3 we presented the key rules that someone who wants to verify programs using predicate multiplication is likely to find convenient. On page 13 we present a series of additional rules, mostly used to establish the "uniform" and "precise" side conditions necessary in our proofs.

Figure 6 is the simplest group, giving basic facts about the fractional pointsto predicate. Only <sup>→</sup> inversion is not immediate from the nonfractional case. It says that it is impossible to have two fractional maps-tos of the same address and with two different values. We need this fact to *e.g.* prove that predicates with existentials such as tree are precise.

(x <sup>π</sup> -<sup>→</sup>y<sup>1</sup> -)∧(<sup>x</sup> <sup>π</sup>- -<sup>→</sup>y<sup>2</sup> -) |y<sup>1</sup> <sup>=</sup>y2<sup>|</sup> → inversion x <sup>π</sup> -<sup>→</sup> <sup>y</sup> ¬emp → emp x <sup>π</sup> -<sup>→</sup> <sup>y</sup> |<sup>x</sup> <sup>=</sup> null<sup>|</sup> → null

**Fig. 6.** Proof theory for fractional maps-to

$$\begin{array}{cc} \overbrace{\texttt{empty\\_uniform}(\pi)}^{\texttt{unifform}(\pi)} & \overbrace{\texttt{uniform}(\pi) \star \texttt{uniform}(\pi) \dashv \texttt{uniform}(\pi)}^{\texttt{unifform}(\pi)} & \overbrace{\texttt{uniform}(\pi)}^{\texttt{unifform}(\pi)} \\ & \overbrace{\pi' \cdot P \vdash \texttt{uniform}(\pi \ulcorner \otimes \pi)}^{P \vdash \texttt{uniform}(\pi)} & \overbrace{\texttt{preics}(x \stackrel{\pi}{\rightarrow} y)}^{\texttt{unifform}(\pi)} & \overbrace{\texttt{preics}(x \stackrel{\pi}{\rightarrow} y)}^{\texttt{unifform}(\pi)} \\ & \overbrace{x \stackrel{\pi}{\rightarrow} y \vdash \texttt{uniform}(\pi)}^{\texttt{unifform}(\pi)} & \overbrace{\texttt{preics}(\pi \cdot P)}^{\texttt{Dom}}^{\texttt{Dom}}^{\texttt{Dom}} \end{array}$$

### **Fig. 7.** Uniformity and precision for predicate multiplication

<sup>G</sup> precisely(P) <sup>G</sup> precisely(Q) <sup>G</sup> precisely(P -<sup>Q</sup>) precisely precisely(P) precise(P) precisely Precise precisely(P) (P -<sup>Q</sup>)∧(P -R) - <sup>⇒</sup> P -(Q∧R) - precisely Left ∃x. <sup>G</sup> precisely <sup>P</sup>(x) - <sup>G</sup> precisely <sup>∀</sup>x.P(x) precisely∀ ∀Q, R. <sup>G</sup> (P -<sup>Q</sup>)∧(P -R) - <sup>⇒</sup> P -(Q∧R) - <sup>G</sup>precisely(P) precisely Right <sup>G</sup> precisely(P) <sup>G</sup> precisely(<sup>P</sup> <sup>∧</sup> <sup>Q</sup>) precisely<sup>∧</sup> ∀x. <sup>G</sup> precisely <sup>P</sup>(x) - ∀x, y. <sup>G</sup> <sup>∧</sup> <sup>P</sup>(x) - - <sup>∧</sup> <sup>P</sup>(y) - - |<sup>x</sup> <sup>=</sup> <sup>y</sup><sup>|</sup> <sup>G</sup> precisely <sup>∃</sup>x.P(x) precisely∃ <sup>G</sup> precisely(P) <sup>G</sup> precisely(Q) <sup>G</sup> <sup>∧</sup> (P - ) <sup>∧</sup> (Q - ) ⊥ <sup>G</sup> precisely(<sup>P</sup> <sup>∨</sup> <sup>Q</sup>) precisely<sup>∨</sup>

### **Fig. 8.** Proof theory for precision

<sup>P</sup> <sup>P</sup> <sup>T</sup> <sup>P</sup> <sup>P</sup> -- <sup>π</sup><sup>P</sup> π<sup>π</sup> <sup>P</sup> π<sup>π</sup> <sup>π</sup><sup>P</sup> <sup>P</sup> <sup>P</sup> <sup>W</sup> <sup>π</sup><sup>P</sup> <sup>π</sup><sup>P</sup> π- <sup>π</sup><sup>P</sup> <sup>π</sup> <sup>P</sup> π (P -<sup>Q</sup>) <sup>∧</sup> <sup>R</sup> (<sup>P</sup> <sup>∧</sup> R) - (<sup>Q</sup> <sup>∧</sup> R) - <sup>P</sup> <sup>U</sup>(π) ∧ ¬emp (P -<sup>Q</sup>) <sup>∧</sup> <sup>π</sup><sup>R</sup> (<sup>P</sup> <sup>∧</sup> <sup>π</sup>R) - (<sup>Q</sup> <sup>∧</sup> <sup>R</sup>) π

*Proving the side conditions for* DotPlus *and* DotStar*.* Figure <sup>7</sup> contains some rules for establishing that P is π-uniform (*i.e.* P uniform(π)) and that P is precise. Since uniformity is a simple property, the rules are easy to state:

To use predicate multiplication we will need to prove two kinds of side conditions: uniform/emp tells us that emp is <sup>π</sup>-uniform for all <sup>π</sup>; the conclusion (all defined heap locations are held with share <sup>π</sup>) is vacuously true. The uniformDot rule tells us that if P is π-uniform then when we multiply P by a fraction π the result is (π ⊗ π)-uniform. The → uniform rule tells us that points-to is uniform. The uniform rule possesses interesting characteristics. The direction follows from uniform/emp and the emp rule (P emp P). The direction is not automatic but very useful. One consequence is that from P uniform(π) and Q uniform(π) we can prove P -Q uniform(π). The direction follows from disjointness but fails over non-disjoint models such as rationals Q.

The <sup>→</sup> precise rule tells us that points-tos are precise. The DotPrecise rule is a partial solution to proving precision. It states that π ·P is precise if and only if P is precise. We will next show how to prove that P itself is precise.

#### **5.2 Proof Theory for Proving that Predicates Are Precise**

Proving that a predicate is π-uniform is relatively straightforward using the proof rules presented so far. However, proving that a predicate is precise is not as pleasant. Traditionally precision is defined (and checked for concrete predicates) in the metalogic [40] using the following definition:

$$\text{precise}(P) \stackrel{\text{def}}{=} \forall h, h\_1, h\_2. \ h\_1 \subseteq h \Rightarrow h\_2 \subseteq h \Rightarrow (h\_1 \mid = P) \Rightarrow (h\_2 \mid = P) \Rightarrow h\_1 = h\_2 \tag{3}$$

Here we write h<sup>1</sup> ⊆ h<sup>2</sup> to mean that h<sup>1</sup> is a subheap of h2, *i.e.* ∃h .h<sup>1</sup> ⊕ h = h2, where ⊕ is the joining operation on the underlying separation algebra [21]. Essentially precision is a kind of uniqueness property: if a predicate P is precise then it can only be true on a single subheap.

Rather than checking precision in the metalogic, we wish to do so in the object logic. We give a proof theory that lets us do so in Fig. 8. Among other advantages, proving precision in the object logic lets tools build on existing separation logic entailment checkers to prove the precision of recursive predicates. The core idea is simple: we define a new object logic operator "precisely(P)" that captures the notion of precision relativized to the current heap; essentially it is a partially applied version of the definition of precise(P) in Eq. (3):

$$h \mid \vdash \mathtt{preisely}(P) \stackrel{\text{def}}{=} \forall h\_1, h\_2. h\_1 \subseteq h \Rightarrow h\_2 \subseteq h \Rightarrow (h\_1 \mid = P) \Rightarrow (h\_2 \mid = P) \Rightarrow h\_1 = h\_2 \quad \text{(4)}$$

Although we have given precisely's model to aid intuition, we emphasize that in Sect. 5 all of our proofs take place in the object logic; we never unfold precisely's definition. Note that precisely is also generally weaker than the typical notion of precision. For example, the predicate x → 7 ∨ y → 7 is not precise; however the entailment z → 8 precisely(x → 7 ∨ y → 7) is provable from Fig. 8.

That said, two notions are closely connected as given in the preciselyPrecise rule. We also give introduction preciselyRight and elimination rules preciselyLeft that make a connection between precision and an "antidistribution" of over ∧.

We also give a number of rules for showing how precisely combines with the connectives of our logic. The rules for propositional ∧ and separating - conjunction follow well-understood patterns, with the addition of an arbitrary premise context G being the key feature. The rule for disjunction ∨ is a little trickier, with an additional premise that forces the disjunction to be exclusive rather than inclusive. An example of such an exclusive disjunction is in the standard definition of the tree predicate, where the first disjunct <sup>x</sup> <sup>=</sup> null is fundamentally incompatible with the second disjunct ∃d, l, r.x → d, l, r - . . . since <sup>→</sup> does not allow the address to be null (by rule <sup>→</sup> null from Fig. 6). The rules for universal quantification ∀ existential quantification ∃ are essentially generalizations of the rules for the traditional conjunction ∧ and disjunction ∨.

It is now straightforward to prove the precision of simple predicates such as <sup>x</sup> <sup>=</sup> null ∨ (∃y.x <sup>→</sup> yy → 0). Finding and proving the key lemmas that enable the proof of the precision of recursive predicates remains a little subtle.

### **5.3 Proof Theory for Induction over the Finiteness of the Heap**

Recursive predicates such as list(x) and tree(x) are common in SL. However, proving properties of such predicates, such as proving that list(x) is precise, is a little tricky since the <sup>μ</sup>FoldUnfold rule provided by the Tarski fixed point does not automatically provide an induction principle. Generally speaking such properties follow by some kind of induction argument, either over auxiliary parameters (*e.g.* if we augment trees to have the form tree(x, τ ), where τ is an inductively-defined type in the metalogic) or over the finiteness of the heap itself. Both arguments usually occur in the metalogic rather than the object logic.

We have two contributions to make for proving inductive properties. First, we show how to do induction over the heap in a fractional setting. Intuitively this is more complicated than in the non-fractional case because there are infinite sequences of strictly smaller subheaps. That is, for a given initial heap h0, there are infinite sequences h1, h2, . . . such that h<sup>0</sup> h<sup>1</sup> h<sup>2</sup> - .... The disjointness property does not fundamentally change this issue, so we illustrate with an example with the shares in Q. The heap h<sup>0</sup> satisfying x 1 → y is strictly larger than the heap <sup>h</sup><sup>1</sup> satisfying <sup>x</sup> <sup>1</sup> 2 → y, which is strictly larger than the heap h<sup>2</sup> satisfying <sup>x</sup> <sup>1</sup> 4 → y; in general h<sup>i</sup> satisfies x 1 2*i* → y. Since our sequence is infinite, we cannot use it as the basis for an induction argument. The solution is that we require that the heaps decrease by at least some constant size c. If each heap subsequent heap must shrink by at least *e.g.* c = 0.25 of a memory cell then the sequence must be finite just as in the non-fractional case, *i.e.* c = F. More sophisticated approaches are conceivable (*e.g.* limits) but they are not easy to automate and we did not find any practical examples that require such methods.

Our second contribution is the development of a proof theory in the object logic that can carry out these kinds of induction proofs in a relatively straightforward way. The proof rules that let us do so are given in Fig. 9. Once good lemmas are identified, we find doing induction proofs over the finite heap formally in the object logic simpler than doing the same proofs in the metalogic.

The key to our induction rules is two new operators: "within" and "shrinking" <sup>π</sup>. Essentially <sup>π</sup>P is used as an induction guard, preventing us from applying our induction hypothesis P until we are on a π-smaller subheap. When π = F we sometimes write just -P. Semantically, if h satisfies <sup>π</sup>P then P is true **on all strict subheaps of** h **that are smaller by at least a** π**-piece**. Accordingly, the key elimination rule π may seem natural: it verifies that the induction guard is satisfied and unlocks the underlying hypothesis. To start an induction proof to prove an arbitrary goal |<sup>=</sup> <sup>P</sup>, we use the rule W to introduce an induction hypothesis, resulting in the new entailment goal of <sup>π</sup>P P.

Some definitions, such as list(x), have only one "recursive call"; others, such as tree(x) have more than one. Moreover, sometimes we wish to apply our inductive hypothesis immediately after satisfying the guard, whereas other times it is convenient to satisfy the guard somewhat before we need the inductive hypothesis. To handle both of these issues we use the "within" operator such that h |= P means P is true on all subheaps of h, which is the intuition behind the rule -. To apply our induction hypothesis somewhat after meeting its guard (or if we wish to apply it more than once) we use the <sup>π</sup> rule to add the modality before eliminating the guard. We will see an example of this shortly.

### **5.4 Using Our Proof Theory**

We now turn to two examples of using our proof theory from page 13 to demonstrate that the rule set is strong and flexible enough to prove real properties.

*Proving that* tree(x) *is* F*-uniform.* Our logical rules for induction and uniformity are able to establish the uniformity of predicates in a fairly simple way. Here we focus on the tree(x) predicate because it is a little harder due to the two recursive "calls" in its unfolding. For convenience, we will write u(π) instead of uniform(π).

Our initial proof goal is tree(x) u(F). Standard natural deduction arguments then reach the goal ∀x.tree(x) ⇒ u(F), after which we apply the W rule (<sup>π</sup> <sup>=</sup> <sup>F</sup> is convenient) to start the induction, adding the hypothesis -∀x.tree(x) ⇒ u(F), which we strengthen with the <sup>π</sup> rule to reach -∀x.tree(x) ⇒ u(F). Natural deduction from there reaches

$$\left( \left( x = \mathtt{null1} \right) \lor \exists d, l, r. x \mapsto (d, l, r) \star \mathtt{tree}(l) \star \mathtt{tree}(r) \right) \land \left( \rhd \ \mathtt{@\forall x. \mathtt{tree}}(x) \Rightarrow \mathtt{u}(\mathcal{F}) \right) \vdash \mathtt{u}(\mathcal{F}) \land \mathtt{array}$$

The proof breaks into two cases. The first reduces to <sup>x</sup> <sup>=</sup> null∧(- ···) u(F), which follows from uniform/emp rule. The second case reduces to - x → (d, l, r) - tree(l) tree(r) ∧ - - ∀x.tree(x) ⇒ u(F) u(F). Then the uniformrule gives

$$\left(\left(x \mapsto (d, l, r) \star \left(\mathsf{tree}(l) \star \mathsf{tree}(r)\right)\right) \land \left(\rhd \odot \forall x. \mathsf{tree}(x) \Rightarrow \mathsf{u}(\mathcal{F})\right) \vdash \mathsf{u}(\mathcal{F}) \star \mathsf{u}(\mathcal{F})\right)$$

We now can cut with the π rule to meet the inductive guard since x → (d, l, r) uniform(F)∧¬emp due to the rules →uniform and →emp. Our remaining goal is thus

$$\left(\left(x \mapsto (d, l, r) \land \rhd \dotsb\right) \star \left(\left(\mathsf{tree}(l) \star \mathsf{tree}(r)\right) \land \odot \forall x. \mathsf{tree}(x) \Rightarrow \mathsf{u}(\mathcal{F})\right) \vdash \mathsf{u}(\mathcal{F}) \star \mathsf{u}(\mathcal{F})$$

We split over -. The first goal is x → (d, l, r) ∧ - ··· u(F), which follows from → u. The second goal is (tree(l) tree(r)) ∧ ∀x.tree(x) ⇒ u(F) u(F). We apply to distribute the inductive hypothesis into the -, and uniform to split the right hand side, yielding

$$\left(\mathsf{tree}(l)\land\odot\forall x.\mathsf{tree}(x)\Rightarrow\mathsf{u}(\mathcal{F})\right)\star\left(\mathsf{tree}(r)\land\odot\forall x.\mathsf{tree}(x)\Rightarrow\mathsf{u}(\mathcal{F})\right)\vdash\mathsf{u}(\mathcal{F})\star\mathsf{u}(\mathcal{F})$$

We again split over to reach two essentially identical cases. We apply rule T to remove the and then reach *e.g.* ∀x.tree(x) ⇒ u(F) tree(l) ⇒ u(F), which is immediate. Further details on this proof can be found in the full paper [2].

*Proving that* list(x) *is precise.* Precision is more complex than π-uniformity, so it is harder to prove. We will use the simpler list(x) as an example; the additional trick we need to prove that tree(x) is precise are applications of the <sup>π</sup> and rules in the same manner as the proof that tree(x) is F-uniform. We have proved that both list(x) and tree(x) are precise using our proof rules in Coq [1].

$$\begin{array}{llll} \mathsf{precisely}(P) \dashv \mathsf{I} \left( P \star \top \right) \Rightarrow \mathsf{precisely}(P) & \begin{array}{l} \mathsf{precise}(P) \\ \begin{array}{l} P \star \mathsf{precisely}(Q) \vdash \mathsf{precisely}(P \star Q) \end{array} \\ \begin{array}{l} Q \land (R \star \top) \vdash \mathsf{precisely}(R) \\ Q \land (S \star \top) \vdash \mathsf{precisely}(S) \end{array} & \forall x. \begin{pmatrix} Q \land \left( P(x) \star \top \right) \vdash \mathsf{precisely}(P(x)) \\ \forall x. y. \left( \left( P(x) \star \top \right) \land \left( P(y) \star \top \right) \vdash \left| x = y \right| \right) \\ \hline Q \land \left( \left( R \lor S \right) \star \top \right) \vdash \mathsf{precisely}(R \land S) \end{array} & \begin{array}{l} \forall x. y. \left( \left( P(x) \star \top \right) \land \left( P(y) \star \top \right) \vdash \bot \mid \exists x \end{array} \\ \hline \end{array} \\ \begin{array}{l} \forall x. y. \left( \left( P(x) \star \top \right) \land \left( P(y) \star \top \right) \vdash \bot \\ \hline Q \land \left( \left( \exists x. P(x) \right) \star \top \right) \vdash \mathsf{precisely}(\exists x. P(x)) \end{array} \end{array}$$

**Fig. 10.** Key lemmas we use to prove recursive predicates precise

In Fig. 10 we give four key lemmas used in our proof<sup>2</sup>. All four are derived (with a little cleverness) from the proof rules given in Fig. 8. We sketch the proof as follows. To prove precise(list(x)) we first use the preciselyPrecise rule to transform the goal into precisely(list(x)). We cannot immediately apply rule W, however, since without a concrete --separated conjunct **outside** the precisely, we cannot dismiss the inductive guard with the π rule. Accordingly, we next use lemma (A) and standard natural deduction to reach the goal ∀x.(list(x) -) <sup>⇒</sup> precisely(list(x)), after which we apply rule W with <sup>π</sup> <sup>=</sup> <sup>F</sup>.

Afterwards we do some standard natural deduction steps yielding the goal

$$\begin{aligned} \left(\rhd \forall x. (\textsf{list}(x) \star \top) \Rightarrow \textsf{previsely}(\textsf{list}(x))\right) \land \left(\left(\langle x = \textsf{null1} \rangle \lor \exists d, n. x \mapsto (d, n) \star \textsf{list}(n)\right) \star \top\right) \vdash \textsf{previsely}\left(\langle x = \textsf{null1} \rangle \lor \exists d, n. x \mapsto (d, n) \star \textsf{list}(n)\right) \end{aligned}$$

We are now in a position to apply lemma (B) to break up the conjunction. We now have three goals. The first goal is that <sup>x</sup> <sup>=</sup> null is precise, which follows from the fact that emp is precise, which in turn can be proved using the rule preciselyRight. The third goal is that the two branches of the disjunction are mutually incompatible, which follows from <sup>x</sup> <sup>=</sup> null being incompatible with maps-to using rule <sup>→</sup>null. The second (and last remaining) goal needs to use lemma (C) twice to break up the existentials. Two of the three new goals are to show that the two existentials are uniquely determined, which follow from <sup>→</sup>inversion, leaving the goal

$$\left(\rhd \forall x.\left(\textsf{list}(x) \star \top\right) \Rightarrow \textsf{precisely}\left(\textsf{list}(x)\right)\right) \land \left(x \mapsto \left(d,n\right) \star \left(\textsf{list}(n) \star \top\right)\right) \vdash \textsf{precisely}\left(x \mapsto \left(d,n\right) \star \textsf{list}(n)\right)$$

We now cut with lemma (D), using rule <sup>→</sup>precise to prove its premise, yielding

$$\left(\rhd \forall x.\big(\mathsf{list}(x)\star\top\right)\Rightarrow \mathsf{precisely}\big(\mathsf{list}(x)\big)\right)\land\left(x\mapsto \left(d,n\right)\star\left(\mathsf{list}(n)\star\top\right)\right)\vdash x\mapsto \left(d,n\right)\star\mathsf{precisely}\big(\mathsf{list}(n)\big)$$

We now use π rule to defeat the inductive guard. The rest is straightforward. Further details on this proof can be found in the full paper [2].

### **6 The ShareInfer fractional biabduction engine**

Having described our logical machinery in Sects. 3, 4 and 5, we now demonstrate that our techniques are well-suited to automation by documenting our ShareInfer

<sup>2</sup> We abuse notation by reusing the inference rule format to present derived lemmas.


**Fig. 11.** Evaluation of our proof systems using ShareInfer

prototype [1]. Our tool is capable of checking whether a user-defined recursive predicate such as list or tree is uniform and/or precise and then conducting biabductive inference over a separation logic entailment containing said predicates.

To check uniformity, the tool first uses heuristics to guess a potential tree share candidate π and then applies proof rules in Figs. 7 and 6 to derive the goal uniform(π). To support more flexibility, our tool also allows users to specify the candidate share π manually. To check precision, the tool maneuvers over the proof rules in Figs. 6 and 8 to achieve the desired goal. In both cases, recursive predicates are handled with the rules in Fig. 9. ShareInfer returns either Yes, No or Unknown together with a human-readable proof of its claim.

For bi-abduction, ShareInfer automatically checks precision and uniformity whenever it encounters a new recursive predicate. If the check returns Yes, the tool will unlock the corresponding rule, *i.e.*, DotPlus for precision and DotStar for uniformity. ShareInfer then matches fragments between the consequent and antecedent while applying folding and unfolding rules for recursive predicates to construct the antiframe and inference frame respectively. For instance, here is the biabduction problem contained in file bi tree2 (see Fig. 11):

$$a \xleftarrow{\mathcal{V}} (b, c, d) \; \star \; \mathcal{L} \cdot \mathsf{tree}(c) \; \star \; \mathcal{R} \cdot \mathsf{tree}(d) \; \star \; [??] \; \vdash \; \mathcal{L} \cdot \mathsf{tree}(a) \; \star \; [??]$$

ShareInfer returns antiframe L·tree(d) and inference frame a R −→(b, c, d)-R·tree(d).

ShareInfer is around 2.5k LOC of Java. We benchmarked it with 27 selective examples from three categories: precision, uniformity and bi-abduction. The benchmark was conducted with a 3.4 GHz processor and 16 GB of memory. Our results are given in Fig. 11. Despite the complexity of our proof rules our performance is reasonable: ShareInfer only took 75.9 ms to run the entire example set, or around 2.8 ms per example. Our benchmark is small, but this performance indicates that more sophisticated separation logic verifiers such as HIP/SLEEK [14] or Infer [9] may be able to use our techniques at scale.

### **7 Building a Model for Our Logic**

Our task now is to provide a model for our proof theories. We present our models in several parts. In Sect. 7.1 we begin with a brief review of Cancellative Separation Algebras (CSA). In Sect. 7.2 we explain what we need from our fractional share models. In Sect. 7.3 we develop an extension to CSAs called "Scaling Separation Algebras" (SSA). In Sect. 7.5 we develop the machinery necessary to support our rules for object-level induction over the heap. We have verified in Coq [1] that the models in Sect. 7.1 support the rules in Fig. 8, the models in Sect. 7.3 support the rules Figs. 3 and 7, and the models in Sect. 7.5 support the rules in Fig. 9.

### **7.1 Cancellative Separation Algebras**

A Separation Algebra (SA) is a set H with an associative, commutative partial operation ⊕. Separation algebras can have a single unit or multiple units; we use *identity*(x) to indicate that x is a unit. A Cancellative SA H, ⊕ further requires that a ⊕ b<sup>1</sup> = c ⇒ a ⊕ b<sup>2</sup> = c ⇒ b<sup>1</sup> = b2. We can define a partial order on H using ⊕ by h<sup>1</sup> ⊆ h<sup>2</sup> def = ∃h .h<sup>1</sup> ⊕ h = h2. Calcagno *et al.* [12] showed that CSAs can model separation logic with the definitions

$$h \Vdash P \star Q \overset{\text{def}}{=} \exists h\_1, h\_2. \ h\_1 \oplus h\_2 = h \land (h\_1 \Vdash P) \land (h\_2 \Vdash Q) \quad \text{and} \quad h \Vdash \texttt{emp} \overset{\text{def}}{=} identity(h).$$

The standard definition of precise(P) was given as Eq. (3) in Sect. 5.2, together with the definition for our new precisely(P) operator in Eq. (4). What is difficult here is finding a set of axioms (Fig. 8) and derivable lemmas (*e.g.* Fig. 10) that are strong enough to be useful in the object-level inductive proofs. Once the axioms are found, proving them from the model given is straightforward. Cancellation is not necessary to model basic separation logic [18], but we need it to prove the introduction preciselyRight and elimination rules preciselyLeft for our new operator.

### **7.2 Fractional Share Algebras**

A fractional share algebra S, ⊕, ⊗, E, F (FSA) is a set S with two operations: partial addition ⊕ and total multiplication ⊗. The substructure S, ⊕ is a CSA with the single unit E. For the reasons discussed in Sect. 2 we require that ⊕ satisfies the disjointness axiom a⊕a = b ⇒ a = E. Furthermore, we require that the existence of a top element F, representing complete ownership, and assume that each element s ∈ S has a complement s such that s ⊕ s = F.

Often (*e.g.* in the fractional → operator) we wish to restrict ourselves to the "positive shares" S<sup>+</sup> def = S \ {E}. To emphasize that a share is positive we often use the metavariable π rather than s. ⊕ is still associative, commutative, and cancellative; every element other than F still has a complement. To enjoy a partial order on S<sup>+</sup> and other SA- or CSA-like structures that lack identities (sometimes called "permission algebras") we define π<sup>1</sup> ⊆ π<sup>2</sup> def = (∃π .π<sup>1</sup> ⊕ π = π2) ∨ (π<sup>1</sup> = π2).

For the multiplicative structure we require that S, ⊗, F be a monoid, *i.e.* that ⊗ is associative and has identity F. Since we restrict maps-tos and the permission scaling operator to be positive, we want S<sup>+</sup>, <sup>⊗</sup>, F to be a submonoid. Accordingly, when {π1, π2} ⊂ <sup>S</sup><sup>+</sup>, we require that <sup>π</sup>1⊗π<sup>2</sup> <sup>=</sup> <sup>E</sup>. Finally, we require that ⊗ distributes over ⊕ on the right, that is (s1⊕s2)⊗s<sup>3</sup> = (s1⊗s3)⊕(s2⊗s3); and that ⊗ is cancellative on the right given a positive left multiplicand, *i.e.* π ⊗ s<sup>1</sup> = π ⊗ s<sup>2</sup> ⇒ s<sup>1</sup> = s2.

The tree share model we present in Sect. 2 satisfies all of the above axioms, so we have a nontrivial model. As we will see shortly, it would be very convenient if we could assume that ⊗ also distributed on the left, or if we had multiplicative inverses on the left rather than merely cancellation on the right. However, we will see in Sect. 8.2 that both assumptions are untenable.

### **7.3 Scaling Separation Algebra**

A scaling separation algebra (SSA) is H, S, ⊕H, ⊕S, ⊗S, E, F, *mul*, *force*, where H, ⊕H is a CSA for heaps and S, ⊕S, ⊗S, E, F is a FSA for shares. Intuitively, *mul*(π, h1) multiplies every share inside h<sup>1</sup> by π and returns the result h2. The multiplication is on the left, so for each original share π in h1, the resulting share in h<sup>2</sup> is π ⊗<sup>S</sup> π . Recall that the informal meaning of π · P is that we have a π-fraction of predicate P. Formally this notion relies on a little trick:

$$h \mid \equiv \pi \cdot P \quad \stackrel{\text{def}}{=} \exists h'. \; mul(\pi, h') = \pi \land h' \mid = P \tag{5}$$

A heap h contains a π-fraction of P if there is a **bigger** heap h satisfying P, and multiplying that bigger heap h by the scalar π gets back to the smaller heap h.

The simpler *force*(π, h1) overwrites all shares in h<sup>1</sup> with the constant share π to reach the resulting heap h2. We use *force* to define the uniform predicate as <sup>h</sup> <sup>|</sup><sup>=</sup> uniform(π) def = *force*(π, h) = h. A heap h is π-uniform when setting all the shares in h to π gets you back to h—*i.e.*, they must have been π to begin with.

$$\begin{array}{lcl} & S\_1 & \operatorname{force}(\pi, \operatorname{force}(\pi', a)) = \operatorname{force}(\pi, a) \\ & S\_3 & \operatorname{mult}(\pi, \operatorname{force}(\pi', a)) = \operatorname{force}(\pi \otimes\_S \pi', a) \\ & S\_5 & \operatorname{identity}(a) \Rightarrow \operatorname{force}(\pi, a) = a \\ & S\_7 & \pi\_1 \subseteq \pi\_2 \Rightarrow \operatorname{force}(\pi\_1, a) \subseteq \underline{H} \text{ force}(\pi\_2, a) \\ & S\_8 & \operatorname{identity}(a) \Rightarrow \operatorname{mult}(\pi, a) = a \\ & S\_{10} & \operatorname{mult}(\pi, a) = \operatorname{mult}(\pi, a) = a \\ & S\_{11} & \operatorname{mult}(\pi, a) = \operatorname{mult}(\pi, a\_2) \Rightarrow a\_1 = a \\ & S\_{12} & \operatorname{mult}(\pi, a\_1) = \operatorname{mult}(\pi, a\_2) \Rightarrow a\_1 = a \\ & S\_{13} & \pi\_1 \ominus \pi\_2 = \pi\_3 \Rightarrow \forall b, c. \left( \operatorname{mult}(\pi, b) \oplus\_H \operatorname{mult}(\pi\_2, b) = c \right) \Leftrightarrow \left( c = \operatorname{mult}(\pi\_3, b) \right) \\ & S\_{14} & \operatorname{mult}(\pi, a) \oplus\_H \operatorname{core}(\pi', b) = \operatorname{core}(\pi', c) \Leftrightarrow \left( c = \operatorname{mult}(\pi\_3, b) \right) \\ & S\_{15} & \operatorname{core}(\pi', a) \oplus\_H \operatorname{core}(\pi', b) = \operatorname{core}(\pi', c) \Leftrightarrow \\ & \left( \pi, \operatorname{force}(\pi', a) \right) \oplus\_H \operatorname{mult}(\pi, \operatorname{force}(\pi', b) \right) = \operatorname{mult}(\pi, \operatorname{force}(\pi', c) \big) \\ \end{array}$$

**Fig. 12.** The 14 additional axioms for scaling separation algebras beyond those inherited from cancellative separation algebras

We need to understand how all of the ingredients in an SSA relate to each other to prove the core logical rules on page 13. We distill the various relationships we need to model our logic in Fig. 12. Although there are a goodly number of them, most are reasonably intuitive.

Axioms S<sup>1</sup> through S<sup>4</sup> describe how *force* and *mul* compose with each other. Axioms S5, S9, and S<sup>10</sup> give conditions when *force* and *mul* are identity functions: when either is applied to empty heaps, and when *mul* is applied to the multiplicative identity on shares F. Axioms S<sup>6</sup> and S<sup>12</sup> relate heap order with forcing the full share F and multiplication by an arbitrary share π. Axiom S<sup>7</sup> says that *force* is order-preserving. Axiom S<sup>8</sup> is how the disjointness axiom on shares is expressed on heaps: when two π-uniform heaps are joined, the result is π-uniform. Axiom S<sup>11</sup> says that *mul* is injective on heaps. Axiom S<sup>13</sup> is delicate. In the ⇒ direction, it states that *mul* preserves the share model's join structure on heaps. In the ⇐ direction, S<sup>13</sup> is similar to axiom S8, saying that the share model's join structure **must** be preserved. Taking both directions together, S<sup>13</sup> translates the **right** distribution property of ⊕<sup>S</sup> over ⊗<sup>S</sup> into heaps. The final axiom S<sup>14</sup> is a bit of a compromise. We wish we could satisfy

$$S\_{14}'. \qquad a \oplus\_H b = c \quad \Leftrightarrow \quad mult(\pi, a) \oplus\_H mult(\pi, b) = mult(\pi, c)$$

S <sup>14</sup> is a kind of dual for S13, *i.e.* it would correspond to a **left** distributivity property of ⊕<sup>S</sup> over ⊗<sup>S</sup> in the share model into heaps. Unfortunately, as we will see in Sect. 8.2, the disjointness of ⊕<sup>S</sup> is incompatible with simultaneously supporting both left and right distributivity. Accordingly, S<sup>14</sup> weakens S <sup>14</sup> so that it only holds when a and b are π -uniform (which by S<sup>8</sup> forces c to be π -uniform). We also wish we could satisfy S <sup>15</sup>: ∀π, a.∃b.*mul*(π, b) = a, which corresponds to left multiplicative inverses, but again (Sect. 8.2) disjointness is incompatible.

### **7.4 Compositionality of Scaling Separation Algebras**

Despite their complex axiomatization, we gain two advantages from developing SSAs rather than directly proving our logical axioms on a concrete model. First, they give us a precise understanding of exactly which operations and properties (S1–S14) are used to prove the logical axioms. Second, following Dockins *et al.* [21] we can build up large SSAs compositionally from smaller SSAs.

To do so cleanly it will be convenient to consider a slight variant of SSAs, "Weak SSAs" that allow, but do not require, the existence of identity elements in the underlying CSA model. A WSSA satisfies exactly the same axioms as an SSA, except that we use the weaker ⊆<sup>H</sup> definition we defined for permission algebras, *i.e.* a<sup>1</sup> ⊆<sup>H</sup> a<sup>2</sup> def = (∃a .a<sup>1</sup> ⊕<sup>H</sup> a = a2) ∨ (a<sup>1</sup> = a2). Note that S<sup>5</sup> and S<sup>9</sup> are vacuously true when the CSA does not have identity elements. We need identity elements to prove the logical axioms from the model; we only use WSSAs to gain compositionality as we construct a suitable final SSA. Keeping the share components S, ⊕S, ⊗S, E, F constant, we give three SSA constructors to get a flavor for what we can do with the remaining components H, ⊕H, *force*, *mul*.

*Example 1 (Shares).* The share model S, ⊕S is an SSA, and the positive (non-<sup>E</sup>) shares S<sup>+</sup>, ⊕ are a WSSA, with *force*S(π, π ) def = π and *mul* <sup>S</sup>(π, π ) def = π⊗π .

*Example 2 (Semiproduct).* Let A, ⊕A, *force*A, *mul* <sup>A</sup> be an SSA/WSSA, and B be a set. Define (a1, b1) <sup>⊕</sup>A×<sup>B</sup> (a2, b2)=(a3, b3) def = a<sup>1</sup> ⊕<sup>A</sup> a<sup>2</sup> = a<sup>3</sup> ∧ b<sup>1</sup> = <sup>b</sup><sup>2</sup> <sup>=</sup> <sup>b</sup>3, *force*A×B(π,(a, b)) def = (*force*A(π, a), b), and *mul* <sup>A</sup>×B(π,(a, b)) def = (*mul* <sup>A</sup>(π, a), b). Then A × B, ⊕A×B, *force*A×B, *mul* <sup>A</sup>×B is an SSA/WSSA.

*Example 3 (Finite partial map).* Let A be a set and B, ⊕B, *force*B, *mul* <sup>B</sup> be an SSA/WSSA. Define <sup>f</sup> <sup>⊕</sup><sup>A</sup> finB <sup>g</sup> <sup>=</sup> <sup>h</sup> pointwise [21]. Define *force*<sup>A</sup> finB(π, f) def = λx.*force*B(π, f(x)) and likewise define *mul* <sup>A</sup> finB(π, f) def = λx.*mul* <sup>B</sup>(π, f(x)). The structure <sup>A</sup> fin B, <sup>⊕</sup><sup>A</sup> finB, *force*<sup>A</sup> finB, *mul* <sup>A</sup> finB is an SSA.

Using these constructors, A fin (S+, V ), *i.e.* finite partial maps from addresses to pairs of positive shares and values, is an SSA and thus can support a model for our logic. We also support other standard constructions *e.g.* sum types +.

#### **7.5 Model for Inductive Logic**

What remains is to give the model that yields the inductive logic in Fig. 9. The key induction guard modal <sup>π</sup> operator is defined as follows:

$$\begin{array}{ccccc} h\_1 \ S\_\pi \ h\_4 & \stackrel{\text{def}}{=} & \exists h\_2, h\_3. \; h\_1 \supseteq\_H h\_2 \land h\_3 \oplus\_H h\_4 = h\_2 \land (h\_3 \mid = \text{unifform}(\pi) \land \neg \text{emp})\\ h \mid = \rhd\_\pi P & \stackrel{\text{def}}{=} & \forall h'. \; (h \; S\_\pi \ h') \Rightarrow (h' \mid = P) \end{array}$$

In other words, <sup>π</sup> is a (boxy) modal operator over the relation Sπ, which relates a heap h<sup>1</sup> with all heaps that are strict subheaps that are smaller by at least a π-piece. The model is a little subtle to enable the rules <sup>π</sup> and <sup>π</sup> that let us handle multiple recursive calls and simplify the engineering. The within operator is much simpler to model:

$$h\_1 \ W \ h\_2 \ \stackrel{\text{def}}{=} \ h\_1 \supseteq\_H h\_2 \ \qquad \qquad h \ \models \ \copyright P \ \stackrel{\text{def}}{=} \ \forall h'. \ (h \ W \ h') \Rightarrow (h' \ \models \ P).$$

All of the rules in Fig. <sup>9</sup> follow from these definitions except for rule W. To prove this rule, we require that the heap model have an additional operator. The "π-quantum", written |h|π, gives the number of times a non-empty π-sized piece can be taken out of h. For disjoint shares, the number of times is no more than the number of defined memory locations in h. We require two facts for |h|π. First, that h<sup>1</sup> ⊆<sup>H</sup> h<sup>2</sup> ⇒ |h1|<sup>π</sup> ≤ |h2|π, *i.e.* that subheaps do not have larger π-quanta than their parent. Second, that h<sup>1</sup> ⊕<sup>H</sup> h<sup>2</sup> = h<sup>3</sup> ⇒ (h<sup>2</sup> |= uniform(π) ∧ ¬emp) ⇒ |h3|<sup>π</sup> > |h1|π, *i.e.* that taking out a π-piece strictly decreases the number of <sup>π</sup>-quanta. Given this setup, rule W follows immediately by induction on <sup>|</sup>h|π. The rules that require the longest proofs in the model are <sup>π</sup> and π.

### **8 Lower Bounds on Predicate Multiplication**

In Sect. 7 we gave a model for the logical axioms we presented in Fig. 3 and on page 13. Our goal here is to show that it is difficult to do better, *e.g.* by having a premise-free DotStar rules or a bidirectional DotImpl rule. In Sect. 8.1 we show that these logical rules force properties on the share model. In Sect. 8.2 we show that disjointness puts restrictions on the class of share models. There are no non-trivial models that have left inverses or satisfy both left and right distributivity.

### **8.1 Predicate Multiplication's Axioms Force Share Model Properties**

The SSA structures we gave in Sect. 7.3 are good for building models that enable the rules for predicate multiplication from Fig. 3. However, since they impose intermediate algebraic and logical signatures between the concrete model and rules for predicate multiplication, they are not good for showing that we cannot do better. Accordingly here we disintermediate and focus on the concrete model A fin (S+, V ), that is finite partial maps from addresses to pairs of positive shares and values. The join operations on heaps operates pointwise [21], with (π1, v1) <sup>⊕</sup> (π2, v2)=(π3, v3) def = π<sup>1</sup> ⊕<sup>S</sup> π<sup>2</sup> = π<sup>3</sup> ∧ v<sup>1</sup> = v<sup>2</sup> = v3, from which we derive the usual SA model for and emp (Sect. 7.1). We define <sup>h</sup> <sup>|</sup><sup>=</sup> <sup>x</sup> <sup>π</sup> → y def = *dom*(h) = {x} ∧ h(x)=(π, y). We define scalar multiplication over heaps ⊗<sup>H</sup> pointwise as well, with <sup>π</sup><sup>1</sup> <sup>⊗</sup> (π2, v) def = (π<sup>1</sup> <sup>⊗</sup><sup>S</sup> <sup>π</sup>2, v), and then define predicate multiplication by <sup>h</sup> <sup>|</sup><sup>=</sup> <sup>π</sup> · <sup>P</sup> def = ∃h . h = π ⊗<sup>H</sup> h = h ∧ h |= P. All of the above definitions are standard except for ⊗H, which strikes us as the only choice (up to commutativity), and predicate multiplication itself.

By Sect. 7 we already know that this model satisfies the rules for predicate multiplication, given the assumptions on the share model from Sect. 7.2. What is interesting is that we can prove the other direction: if we assume that the key logical rules from Fig. 3 hold, they force axioms on the share model. The key correspondences are: DotFull forces that <sup>F</sup> is the left identity of <sup>⊗</sup>S; DotMapsTo forces that <sup>F</sup> is the right identity of <sup>⊗</sup>S; DotMapsTo forces the associativity of <sup>⊗</sup>S; the direction of DotConj forces the right cancellativity of <sup>⊗</sup><sup>S</sup> (as does DotImpl and the direction of DotUniv); and DotPlus, which forces right distributivity of ⊗<sup>S</sup> over ⊕S.

The following rules force left distributivity of ⊗<sup>S</sup> over ⊕<sup>S</sup> and left ⊗<sup>S</sup> inverses:

$$\frac{\pi \cdot (P \star Q) \dashv \pi \cdot (\pi \cdot P) \star (\pi \cdot Q)}{\pi \cdot (P \star P) \star (\pi \cdot Q)}\_{\text{Strar}'}^{\text{Dror}} \quad \frac{\pi \cdot (P \Rightarrow Q) \dashv (\pi \cdot P) \Rightarrow (\pi \cdot Q)}{\pi \cdot (P \Rightarrow Q) \dashv (\pi \cdot Q)}\_{\text{Implt}'}^{\text{Dotor}'}$$

The direction of DotStar also forces that <sup>⊕</sup><sup>S</sup> satisfies disjointness; this is the key reason that we cannot use rationals (0, 1], +, ×. Clearly the side-conditionfree DotStar rule is preferable to the DotStar in Fig. 3, and it would also be preferable to have bidirectionality for predicate multiplication over implication and negation. Unfortunately, as we will see shortly, the disjointness of ⊕<sup>S</sup> places strong multiplicative algebraic constraints on the share model. These constraints are the reason we cannot support the DotImpl rule and why we require the π -uniformity side condition in our DotStar rule.

### **8.2 Disjointness in a Multiplicative Setting**

Our goal now is to explore the algebraic consequences of the disjointness property in a multiplicative setting. Suppose S, ⊕ is a CSA with a single unit E, top element F, and ⊕ complements s. Suppose further that shares satisfy the disjointness property a⊕a = b ⇒ a = E. For the multiplicative structure, assume S, <sup>⊗</sup>, F is a monoid (*i.e.* the axioms forced by the DotDot, DotMapsTo, and DotFull rules). It is undesirable for a share model if multiplying two positive shares (*e.g.* the ability to read a memory cell) results in the empty permission, so we assume that when π<sup>1</sup> and π<sup>2</sup> are non-E then their product π<sup>1</sup> ⊗ π<sup>2</sup> = E.

Now add left or right distributivity. We choose right distributivity (s<sup>1</sup> ⊕s2)⊗ s<sup>3</sup> = (s<sup>1</sup> ⊗s3)⊕(s<sup>2</sup> ⊗s3); the situation is mirrored with left. Let us show that we cannot have left inverses for π = F. We prove by contradiction: suppose π = F and there exists <sup>π</sup>−<sup>1</sup> such that <sup>π</sup>−<sup>1</sup> <sup>⊗</sup> <sup>π</sup> <sup>=</sup> <sup>F</sup>. Then

$$
\pi = \mathcal{F} \otimes \pi = \left(\pi^{-1} \oplus \overline{\pi^{-1}}\right) \otimes \pi = \left(\pi^{-1} \otimes \pi\right) \oplus \left(\overline{\pi^{-1}} \otimes \pi\right) = \mathcal{F} \oplus \left(\overline{\pi^{-1}} \otimes \pi\right),
$$

Let e = π−<sup>1</sup> ⊗ π. Now π = F ⊕ e = (e ⊕ e) ⊕ e, which by associativity and disjointness forces e = E, which in turn forces π = F, a contradiction.

Now suppose that instead of adding multiplicative inverses we have both left and right distributivity. First we prove (Lemma 1) that for arbitrary s ∈ S, s ⊗ s = s ⊗ s. We calculate:

$$(s \otimes s) \oplus (s \otimes \overline{s}) = s \otimes (s \oplus \overline{s}) = s \otimes \mathcal{F} = s = \mathcal{F} \otimes s = (s \oplus \overline{s}) \otimes s = (s \otimes s) \oplus (\overline{s} \otimes s)$$

Lemma 1 follows by the cancellativity of ⊕ between the far left and the far right. Now we show (Lemma 2) that s ⊗ s = E. We calculate:

$$\begin{aligned} \mathcal{F} = \mathcal{F} \otimes \mathcal{F} = (s \oplus \overline{s}) \otimes (s \oplus \overline{s}) &= (s \otimes s) \oplus (s \otimes \overline{s}) \oplus (\overline{s} \otimes s) \oplus (\overline{s} \otimes \overline{s}) \\ &= (s \otimes s) \oplus \underbrace{(s \otimes \overline{s}) \oplus (s \otimes \overline{s})}\_{} \oplus (\overline{s} \otimes \overline{s}) \end{aligned}$$

The final equality is by Lemma 1. The underlined portion implies s ⊗ s = E by disjointness. The upshot of Lemma 2, together with our requirement that the product of two positive shares be positive, is that we can have no more than the two elements E and F in our share model. Since the entire motivation for fractional share models is to allow ownership between E and F, we must choose either left or right distributivity; we choose right since we are able to prove that the π -uniformity side condition enables the bidirectional DotStar.

### **9 Related Work**

Fractional permissions are essentially used to reason about resource ownership in concurrent programming. The well-known rational model [0, 1], + by Boyland *et al.* [5] is used to reason about join-fork programs. This structure has the disjointness problem mentioned in Sect. 2, first noticed by Bornat *et al.* [4], as well as other problems discussed in Sects. 3, 4, and [2]. Boyland [6] extended the framework to scale permissions uniformly over arbitrary predicates with multiplication, *e.g.*, he defined π · P as "multiply each permission π in P with π". However, his framework cannot fit into SL and his scaling rules are not bidirectional. Jacobs and Piessens [28] also used rationals for scaling permissions <sup>π</sup> · <sup>P</sup> in SL but only obtained one direction for DotStar and DotPlus. A different kind of scaling permission was used by Dinsdale-Young *et al.* [20] in which they used rationals to define permission assertions [A] r <sup>π</sup> to indicate a thread with permission π can execute the action A over the shared region r.

There are other flavors of permission besides rationals. Bornat *et al.* [4] introduced integer counting permissions Z, +, 0 to reason about semaphores and combined rationals and integers into a hybrid permission model. Heule *et al.* [23] flexibly allowed permissions to be either concretely rational or abstractly readonly to lower the nuisance of detailed accounting. A more general read-only permissions was proposed by Chargu´eraud and Pottier [13] that transforms a predicate P into read-only mode RO(P) which can duplicated/merged with the bi-entailment RO(P) RO(P) - RO(P). Their permissions distribute pleasantly over disjunction and existential quantifier but only work one way for -, *i.e.*, RO(H<sup>1</sup> - H2) RO(H1) - RO(H2). Parkinson [41] proposed subsets of the natural numbers for shares P(N), to fix the disjointness problem. Compared to tree shares, Parkinson's model is less practical computationally and does not have an obvious multiplicative structure.

Protocol-based logics like FCSL [38] and Iris [30] have been very successful in reasoning about fine-grained concurrent programs, but their high expressivity results in a heavyweight logic. Automation (*e.g.* inference such as we do in Sect. 4) has been hard to come by. We believe that fractional permissions and protocol-based logics are in a meaningful sense complementary rather than competitors.

Verification tools often implement rational permissions because of its simplicity. For example, VeriFast [29] uses rationals to verify programs with locks and semaphores. It also allows simple and restrictive forms of scaling permissions which can be applied uniformly over standard predicates. On the other hand, HIP/SLEEK [31] uses rationals to model "thread as resource" so that the ownership of a thread and its resources can be transferred. Chalice [36] has rational permissions to verify properties of multi-threaded, objected-based programs such as data races and dead-locks. Viper [37] has an expressive intermediate language that supports both rational and abstract permissions. However, a number of verification tools have chosen tree shares due to their better metatheoretical properties. VST [3] is equipped with tree share permissions and an extensive tree share library. HIP/SLEEK uses tree shares to verify the barrier structure [26] and has its own complete share solver [33,35] that reduces tree formulae to Boolean formulae handled by Z3 [17]. Lastly, tree share permissions are featured in Heap-Hop [47] to reason over asynchronous communications.

### **10 Conclusion**

We presented a separation logic proof framework to reason about resource sharing using fractional permissions in concurrent verification. We support sophisticated verification tasks such as inductive predicates, proving predicates precise, and biabduction. We wrote ShareInfer to gauge how our theories could be automated. We developed scaling separation algebras as compositional models for our logic. We investigated why our logic cannot support certain desirable properties.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Deadlock-Free Monitors**

Jafar Hamin(B) and Bart Jacobs

imec-DistriNet, Department of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001 Heverlee, Belgium {jafar.hamin,bart.jacobs}@cs.kuleuven.be

**Abstract.** Monitors constitute one of the common techniques to synchronize threads in multithreaded programs, where calling a wait command on a condition variable suspends the caller thread and notifying a condition variable causes the threads waiting for that condition variable to resume their execution. One potential problem with these programs is that a waiting thread might be suspended forever leading to deadlock, a state where each thread of the program is waiting for a condition variable or a lock. In this paper, a modular verification approach for deadlockfreedom of such programs is presented, ensuring that in any state of the execution of the program if there are some threads suspended then there exists at least one thread running. The main idea behind this approach is to make sure that for any condition variable v for which a thread is waiting there exists a thread obliged to fulfil an obligation for v that only waits for a waitable object whose wait level, an arbitrary number associated with each waitable object, is less than the wait level of v. The relaxed precedence relation introduced in this paper, aiming to avoid cycles, can also benefit some other verification approaches, verifying deadlock-freedom of other synchronization constructs such as channels and semaphores, enabling them to accept a wider range of deadlock-free programs. We encoded the proposed proof rules in the VeriFast program verifier and by defining some appropriate invariants for the locks associated with some condition variables succeeded in verifying some popular use cases of monitors including unbounded/bounded buffer, sleeping barber, barrier, and readers-writers locks. A soundness proof for the presented approach is provided; some of the trickiest lemmas in this proof have been machine-checked with Coq.

### **1 Introduction**

One of the popular mechanisms for synchronizing threads in multithreaded programs is using monitors, a synchronization construct allowing threads to have mutual exclusion and also the ability to wait for a certain condition to become true. These constructs, consisting of a mutex/lock and some condition variables, provide some basic functions for their clients, namely wait(v,l), causing the calling thread to wait for the condition variable v and release lock l while doing so, and notify(v)/notifyAll(v), causing one/all thread(s) waiting for v to resume their execution. Each condition variable is associated with a lock; a thread must acquire the associated lock for waiting or notifying on a condition variable, and when a thread is notified it must reacquire the associated lock.

However, one potential problem with these synchronizers is deadlock, where all threads of the program are waiting for a condition variable or a lock. To clarify the problem consider the program in Fig. 1, where a channel consists of a queue q, a lock l and a condition variable v, protecting a thread from dequeuing q when it is empty. In this program the receiver thread first acquires lock l and while there is no item in q it releases l, suspends itself and waits for a notification on v. If this thread is notified while q is not empty it dequeues an item and finally releases l. The sender thread also acquires the same lock, enqueues an item into q, notifies one of the threads waiting for v, if any, and lastly releases l. After creating a channel ch, the main thread of the program first forks a thread to receive a message from ch and then sends a message on ch. Although this program is deadlock-free, it is easy to construct some variations of it that lead to deadlock: if the main thread itself, before sending any messages, tries to receive a message from ch, or if the number of receives is greater than the number of sends, or if the receiver thread waits for v even if q is not empty.


**Fig. 1.** A message passing program synchronized using a monitor

Several approaches to verify termination, deadlock-freedom, liveness, and finite blocking of threads of programs have been presented. Some of these approaches only work with non-blocking algorithms [1–3], where the suspension of one thread cannot lead to the suspension of other threads. These approaches are not applicable for condition variables because suspension of a sender thread in Fig. 1, for example, might cause a receiver thread to be blocked forever. Some other approaches are also presented to verify termination of programs using some blocking constructs such as channels [4–6] and semaphores [7]. These approaches are not general enough to cover condition variables because unlike the channels and semaphores a notification of a condition variable is lost when there is no thread waiting for that condition variable. There are also some studies [8–10] to verify correctness of programs that support condition variables. However, these approaches either only cover a very specific application of condition variables, such as a buffer program with only one producer and one consumer, or are not modular and suffer from a long verification time when the size of the state space, such as the number of threads, is increased.

In this paper we present a modular approach to verify deadlock-freedom of programs in the presence of condition variables. More specifically, this approach makes sure that for any condition variable v for which a thread is waiting there exists a thread obliged to fulfil an obligation for v that only waits for a waitable object whose wait level, an arbitrary number associated with each waitable object, is less than the wait level of v. The presented approach is modular, meaning that different modules (functions) of a program can be verified individually. This approach is based on Leino *et al.* [4] approach for verification of deadlock-freedom in the presence of channels and locks, which in turn was based on Kobayashi's [6] type system for verifying deadlock-freedom of π-calculus processes, and extends the separation logic-based encoding [11] by covering condition variables. We implemented the proposed proof rules in the VeriFast verifier [12–14] and succeeded in verifying some common applications of condition variables such as bounded/unbounded buffer, sleeping barber [15], barrier, and readers-writers locks (see the full version of this paper [16] reporting the verification time of these programs).

This paper is structured as follows. Section 2 provides some background information on the existing approaches upon which we build our verification algorithm. Section 3 introduces a preliminary approach for verifying deadlockfreedom of some common applications of condition variables. In Sect. 4 the precedence relation, aiming to avoid cycles, is relaxed, making it possible to verify some trickier applications of condition variables. A soundness proof of the presented approach is lastly given in Sect. 5.

### **2 Background Information on the Underlying Approaches**

In this section we provide some background information on the existing approaches that verify absence of data races and deadlock in the presence of locks and channels that we build on.

### **2.1 Verifying Absence of Data Races**

Locks/mutexes are mostly used to avoid data races, an undesired situation where a heap location is being written and accessed concurrently by two different threads. One common approach to verify absence of these undesired conditions is ownership: ownership of heap locations is assigned to threads and it is verified that a thread accesses only the heap locations that it owns. Transferring ownership of heap locations between threads is supported through locks by allowing locks, too, to own heap locations. While a lock is not held by a thread, it owns the heap locations described by its *invariant*. More specifically, when a lock is created the resources specified by its invariant are transferred from the creating thread to the lock, when that lock is acquired these resources are transferred from the lock to the acquiring thread, and when that lock is released these resources, that must be again in possession of the thread, are again transferred from the thread to the lock [17]. Figure 2 illustrates how a program increasing a

```
x:=newint(0);
{x-
  →0}
l := newlock;
{ulock(l) ∗ x-
             →0}
ct := counter(x:=x, l:=l);
{ulock(ct.l) ∗ ct.x-
                   →0}
{ulock(ct.l) ∗ inv(ct)}
{lock(ct.l) ∧ I(l)=inv(ct)}
{lock(ct.l) ∗ lock(ct.l)}
fork (inc(ct));
{lock(ct.l)}
inc(ct)
                                     routine inc(counter ct){
                                      {lock(ct.l) ∧ I(l)=inv(ct)}
                                      acquire(ct.l);
                                      {locked(ct.l) ∗ ∃z. ct.x-
                                      ct.x:=ct.x+1;
                                      {locked(ct.l) ∗ ∃z. ct.x-
                                      release(ct.l)
                                      {lock(ct.l)}}
```
→z}

→z}

**Fig. 2.** Verification of data-race-freedom of a program, where inv = λct. ∃z. ct.x→z

counter, which consists of an integer variable x and a lock l protecting this variable, can be verified, where two threads try to write on the variable x. We use separation logic [18] to reason about the ownership of permissions. As indicated below each command, creating the integer variable x initialized by zero provides a read/write access permission to <sup>x</sup>, denoted by <sup>x</sup>-→0. This ownership, that is going to be protected by lock l, is transferred to the lock because it is asserted by the lock invariant inv, which is associated with the lock, as denoted by function I, at the point where the lock is initialized. The resulting lock permission, that can be duplicated, is used in the routine inc, where x is increased under protection of lock l. Acquiring this lock in this routine provides a full access permission to x and transforms the lock permission to a locked permission, implying that the related lock has been acquired. Releasing that lock again consumes this access permission and transforms the locked permission to a lock one.

### **2.2 Verifying Absence of Deadlock**

One potential problem with programs using locks and other synchronization mechanisms is deadlock, an undesired situation where all threads of the program are waiting for some waitable objects. For example, a program can deadlock if a thread acquires a lock and forgets to release it, because any other thread waiting for that lock never succeeds in acquiring that lock. As another example, if in a message passing program the number of threads trying to receive a message from a channel is greater than the number of messages sent on that channel there will be some threads waiting for that channel forever. One approach to verify deadlock-freedom of channels and locks is presented by Leino *et al.* [4] that guarantees deadlock-freedom of programs by ensuring that (1) for any *obligee* thread waiting for a waitable object, such as a channel or lock, there is an *obligation* for that object that must be fulfilled by an *obligor* thread, where a thread can fulfil an obligation for a channel/lock if it sends a message on that channel/releases that lock, and (2) each thread waits for an object only if the *wait level* of that object, an arbitrary number assigned to each waitable object, is lower than the wait levels of all obligations of that thread. The second rule is established by making sure that when a thread with some obligations O executes a command acquire(o)/receive(o) the precondition <sup>o</sup>≺<sup>O</sup> holds, i.e. the wait level of o is lower than the wait levels of obligations in O. To meet the first rule where the waitable object is a lock, as the example in the left side of Fig. 3 illustrates, after acquiring a lock, that lock is loaded onto the bag<sup>1</sup> (multiset) of obligations of the thread, denoted by obs(O). This ensures that if a thread tries to acquire a lock that has already been acquired then there is one thread obliged to fulfil an obligation for that lock.


**Fig. 3.** Verification of deadlock-freedom of locks (left side) and channels (right side)

To establish the first rule where the waitable object is a channel any thread trying to receive a message from a channel ch must spend one *credit* for ch. This credit is normally obtained from the thread that has forked the receiver thread, where this credit is originally created by loading ch onto the bag of obligations of the forking thread. The forking thread can discharge the loaded obligation by either sending a message on the corresponding channel or delegating it to a child thread that can discharge it. The example on the right side of Fig. 3 shows the verification of deadlock-freedom a program in which the main routine, after forking a obligee thread trying to receive a message from channel ch, sends a message on this channel. Before forking the receiver thread, a credit and an obligation for the channel ch are created in the main thread. The former is given to the forked thread, where this credit is spent by the receive(ch) command, and the latter is fulfilled by the main thread when it executes the command send(ch, 12).

More formally, the mentioned verification approach satisfies the first rule by ensuring that for each channel ch in the program the number of obligations for ch is equal to/greater than the number of threads waiting for ch. This assurance is obtained by preserving the invariant *Wt*(ch)+Ct(ch) - Ot(ch)+sizeof(ch), while the programming language itself ensures that sizeof(ch) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> *Wt*(ch) = 0, where sizeof is a function mapping each channel to the size of its queue, *Wt*(ch)

<sup>1</sup> We treat bags of waitable objects as functions from waitable objects to natural numbers.

is the total number of threads currently waiting for channel ch, Ot(ch) is the total number of obligations for channel ch held by all threads, and Ct(ch) is the total number of credits for channel ch currently in the system.

### **2.3 Proof Rules**

The separation logic-based proof rules, introduced by Jacobs *et al.* [11], avoiding data races and deadlock in the presence of locks and channels are shown in Fig. 4, where R and I are functions mapping a waitable object/lock to its wait level/invariant, respectively, and g initl, and g load are some *ghost commands* used to initialize an uninitialized lock permission and load a channel onto the bag of obligations and credits of a thread, respectively. When a lock is created, as shown in NewLock, an uninitialized lock permission ulock(l) is provided for that thread. Additionally, an arbitrary integer number z can be decided as the wait level of that lock that is stored in R. Note that variable z in this rule is universally quantified over the rule, and different applications of the NewLock rule can use different values for this variable. The uninitialized lock permission, as shown in InitLock, can be converted to a normal lock permission lock(l) provided that the resources described by the invariant of that lock, stored in I, that must be in possession of the thread, are transferred from the thread to the lock. By the rule Acquire, having a lock permission, a thread can acquire that lock if the wait levels of obligations of that thread are all greater than the wait level of that lock. After acquiring the lock, the resources represented by the invariant of that lock are provided for the acquiring thread and the permission lock is converted to a locked permission. When a

NewLock {true} newlock {λl. ulock(l) ∧ R(l)=z} InitLock {ulock(l) ∗ i} g initl(l) {λ . lock(l) ∧ I(l)=i} Acquire {lock(l) <sup>∗</sup> obs(O) <sup>∧</sup> <sup>l</sup>≺O} acquire(l) {<sup>λ</sup> . obs(O{[l}] ) <sup>∗</sup> locked(l) <sup>∗</sup> <sup>I</sup>(l)} Release {obs(O) <sup>∗</sup> locked(l) <sup>∗</sup> <sup>I</sup>(l)} release(l) {<sup>λ</sup> . obs(O−{[l}] ) <sup>∗</sup> lock(l)} NewChannel {true} newchannel {λch. R(ch)=z} Send {obs(O)} send(ch, v) {λ . obs(O−{[ch}] )} Receive {obs(O) ∗ credit(ch) ∧ ch≺O} receive(ch) {λ . obs(O)} Fork {a ∗ obs(O)} c {λ . obs({[]})} {a ∗ obs(OO- )} fork(c) {λ . obs(O- )} DupLock lock(l) <sup>⇔</sup> lock(l) <sup>∗</sup> lock(l) LoadOb {obs(O)} <sup>g</sup> load(ch) {<sup>λ</sup> . obs(O{[ch}] ) <sup>∗</sup> credit(ch)}

**Fig. 4.** Proof rules ensuring deadlock-freedom of channels and locks, where o≺O ⇔ ∀o- ∈ O. R(o) < R(o- )

thread releases a lock, as shown in the rule Release, the resources indicated by the invariant of that lock, that must be in possession of the releasing thread, are transferred from the thread to the lock and the permission locked is again converted to a lock permission. By the rule Receive a thread with obligations <sup>O</sup> can try to receive a message from a channel ch only if the wait level of ch is lower than the wait levels of all obligations in O. This thread must also spend one credit for ch, ensuring that there is another thread obliged to fulfil an obligation for ch. As shown in the rule Send, an obligation for this channel can be discharged by sending a message on that channel. Alternatively, by the rule Fork, a thread can discharge an obligation for a channel if it delegates that obligation to a child thread, provided that the child thread discharges the delegated obligation. In this setting the verification of a program starts with an empty bag of obligations and must also end with such bag implying that there is no remaining obligation to fulfil.

However, this verification approach is not straightforwardly applicable to condition variables. A command notify cannot be treated like a command send because a notification on a condition variable is lost when there is no thread waiting for that variable. Accordingly, it does not make sense to discharge an obligation for a condition variable whenever it is notified. Similarly, a command wait cannot be treated like a command receive. A command wait is normally executed in a while loop, checking the *waiting condition* of the related condition variable. Accordingly, it is impossible to build a loop invariant for such a loop if we force the wait command to spend a credit for the related condition variable.

### **3 Deadlock-Free Monitors**

### **3.1 High-Level Idea**

In this section we introduce an approach to verify deadlock-freedom of programs in the presence of condition variables. This approach ensures that the verified program never deadlocks, i.e. there is always a running thread, that is not blocked, until the program terminates. The main idea behind this approach is to make sure that for any condition variable v for which a thread is waiting there exists a thread obliged to fulfil an obligation for v that only waits for a waitable object whose wait level is less than the wait level of v. As a consequence, if the program has some threads suspended, waiting for some obligations, there is always a thread obliged to fulfil the obligation omin that is not suspended, where omin has a minimal wait level among all waitable objects for which a thread is waiting. Accordingly, the proposed proof rules make sure that (1) when a command wait(v,l) is executed Ot(v) > 0, where Ot maps each condition variable v to the total number of obligations for v held by all threads (note that having a thread with permission obs(O) implies O(v) - Ot(v)), (2) a thread discharges an obligation for a condition variable only if after this discharge the invariant one ob(v,*Wt*, Ot) defined as *Wt*(v) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> Ot(v) <sup>&</sup>gt; 0 still holds, where *Wt*(v) denotes the number of threads waiting for condition variable v, and (3) a thread with obligations <sup>O</sup> executes a command wait(v,l) only if <sup>v</sup>≺O.

### **3.2 Tracking Numbers of Waiting Threads and Obligations**

For all condition variables associated with a lock l the value of functions *Wt* and Ot can only be changed by a thread that has locked l; *Wt*(v) is changed only when one of the commands wait(v,l)/notify(v)/notifyAll(v) is executed, requiring holding lock l, and we allow Ot(v) to be changed only when a permission locked for l is available. Accordingly, when a thread acquires a lock these two bags are stored in the related locked permission and are used to establish the rules number 1 and 2, when a thread executes a wait command or discharges one of its obligations. Note that the domain of these functions is the set of the condition variables associated with the related lock. The thread executing the critical section can change these two bags under some circumstances. If that thread loads/discharges a condition variable onto/from the list of its obligations this condition variable must also be loaded/discharged onto/from the bag Ot stored in the related locked permission. Note that unlike the approach presented by Leino *et al.* [4], an obligation for a condition variable can arbitrarily be loaded or discharged by a thread, provided that the rule number 2 is respected. At the start of the execution of a wait(v,l) command, *Wt*(v) is incremented and after execution of commands notify(v)/notifyAll(v) one/all instance(s) of v is/are removed from the bag *Wt* stored in the related locked permission, since these commands change the number of threads waiting for v.

A program can be successfully verified according to the mentioned rules, formally indicated in Fig. 5, if each lock associated with any condition variable v has an appropriate invariant such that it implies the desired invariant one ob(v,*Wt*, Ot). Accordingly, the proof rules allow locks to have invariants parametrized over the bags *Wt* and Ot. When a thread acquires a lock the result of applying the invariant of that lock to these two bags, stored in the related locked permission, is provided for the thread and when that lock is released it is expected that the result of applying the lock invariant to those bags, stored in the related locked permission, again holds. However, before execution of a command wait(v,l), when lock l with bags *Wt* and Ot stored in its locked permission is going to be released, it is expected that the invariant of l holds with bags *Wt*{[v}] and Ot because the running thread is going to wait for <sup>v</sup> and this condition variable is going to be added to *Wt*. As this thread resumes its execution, when it has some bags *Wt* and Ot stored in the related locked permission, the result of applying the invariant of l to these bags is provided for that thread. Note that the total number of threads waiting for v, *Wt*(v), is already decreased when a command notify(v) or notifyAll(v) is executed, causing the waiting thread(s) to wake up and try to acquire the lock associated with v.

### **3.3 Resource Transfer on Notification**

In general, as we will see when looking at examples, it is sometimes necessary to transfer resources from a notifying thread to the threads being notified<sup>2</sup>.

<sup>2</sup> This transfer is only sound in the absence of spurious wake-ups, where a thread is awoken from its waiting state even though no thread has signaled the related condition variable.

To this end, these resources, specified by a function M, are associated with each condition variable v when v is created, such that the commands notify(v)/notifyAll(v) consume one/*Wt*(v) instance(s) of these resources, respectively, and the command wait(v,l) produces one instance of such resources (see the rules Wait,Notify, and NotifyAll in Fig. 5).

NewLock {true} newlock {λl. ulock(l, {[]}, {[]}) <sup>∧</sup> <sup>R</sup>(l)=z}

NewCv {true} newcond {λv. <sup>R</sup>(v)=<sup>z</sup> <sup>∧</sup> <sup>L</sup>(v)=<sup>l</sup> <sup>∧</sup> <sup>M</sup>(v)=m}

Acquire {lock(l) <sup>∗</sup> obs(O) <sup>∧</sup> <sup>l</sup>≺O} acquire(l) {λ . ∃*Wt*, Ot. locked(l, *Wt*, Ot) ∗ I(l)(*Wt*, Ot) ∗ obs(O{[l}] )}

Release

{locked(l, *Wt*, Ot) ∗ I(l)(*Wt*, Ot) ∗ obs(O{[l}] )} release(l) {λ . lock(l) ∗ obs(O)}

Wait {locked(l, *Wt*, Ot) ∗ I(l)(*Wt*{[v}] , Ot) ∗ obs(O{[l}] ) ∧ l=L(v) ∧ v≺O ∧ l≺O ∧ safe obs(v, *Wt*{[v}] , Ot)} wait(v, l) {λ . obs(O{[l}] ) ∗ ∃*Wt*- , Ot- . locked(l, *Wt*- , Ot- ) ∗ I(l)(*Wt*- , Ot- ) ∗ M(v)}

Notify {locked(L(v), *Wt*, Ot) <sup>∗</sup> (*Wt*(v)=0 <sup>∨</sup> <sup>M</sup>(v))} notify(v) {λ . locked(L(v), *Wt*−{[v}] , Ot)}

NotifyAll

$$\{\mathsf{locked}(\mathsf{L}(v), Wt, Ot) \* (\mathop{\mathrm{\mathrm{\tiny{\tiny{\mathrm{\tiny{\mathrm{\mathrm{\tiny{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\cdot}{\mathrm{\cdot}}}}}}}}}}}}}}}}}{}}}} $$

InitLock

```
{ulock(l, Wt, Ot) ∗ inv(Wt, Ot) ∗ obs(O)} g initl(l) {λ . lock(l) ∗ obs(O) ∧ I(l)=inv}
```

```
ChargeOb {obs(O) ∗ ulock/locked(L(v), Wt, Ot)} g chrg(v)
            {λ . obs(O{[v}] ) ∗ ulock/locked(L(v), Wt, Ot{[v}] )}
```

```
DisOb {obs(O) ∗ ulock/locked(L(v), Wt, Ot) ∧ safe obs(v, Wt(v), Ot−{[v}] )}
          g disch(v) {λ . obs(O−{[v}] ) ∗ ulock/locked(L(v), Wt, Ot−{[v}] )}
```
**Fig. 5.** Proof rules to verify deadlock-freedom of condition variables, where *Wt*(v) and Ot(v) denote the total number of threads waiting for v and the total number of obligations for v, respectively, and safe obs(v, *Wt*, Ot) ⇔ one ob(v, *Wt*, Ot) and one ob(v, *Wt*, Ot) ⇔ (*Wt*(v) > 0 ⇒ Ot(v) > 0)

#### **3.4 Proof Rules**

Figure 5 shows the proposed proof rules used to verify deadlock-freedom of condition variables, where L and M are functions mapping each condition variable to its associated lock and to the resources that are moved from the notifying thread to the notified one when that condition variable is notified, respectively. Creating a lock, as shown in the rule NewLock, produces a permission ulock storing the bags *Wt* and Ot, where these bags are initially empty. The bag Ot in this permission, similar to a locked one, can be changed provided that the obligations of the running thread are also updated by one of the ghost commands <sup>g</sup> chrg(v) or <sup>g</sup> disch(v) (see rules ChargeOb and DisOb). The lock related to this permission can be initialized by transferring the resources described by the invariant of this lock, that is now parametrized over the bags *Wt* and Ot, applied to the bags stored in this permission from the thread to the lock (see rule InitLock). When this lock is acquired, as shown in the rule Acquire, the resources indicated by its invariant are provided for the thread, and when it is released, as shown in the rule Release, the resources described by its invariant that must hold with appropriate bags, are again transferred from the thread to the lock. The rules Wait and DisOb ensure that for any condition variable v when the number of waiting threads is increased, by executing a command wait(v,l), or the number of the obligations is decreased, by (logically) executing a command g disch(v), the desired invariant one ob still holds. Additionally, the rules Acquire and Wait make sure that a thread only waits for a waitable object whose wait level is lower that the wait levels of obligations of that thread. Note that in the rule Wait in the precondition of the command wait(v,l) it is not necessary that the wait level of v is lower that the wait level of l, since lock l is going to be released by this command. However, in this precondition the wait level of l must be lower that the wait levels of the obligations of the thread because when this thread is notified it tries to reacquire <sup>l</sup>, at which point <sup>l</sup>≺<sup>O</sup> must hold. The commands notify(v)/notifyAll(v), as shown in the rules Notify and NotifyAll, remove one/all instance(s) of <sup>v</sup>, if any, from the bag *Wt* stored in the related locked permission. Additionally, notify(v) consumes the moving resources, indicated by M(v), that appear in the postcondition of the notified thread. Note that notifyAll(v) consumes *Wt*(v) instances of these resources, since they are transferred to *Wt*(v) threads waiting for v.

### **3.5 Verifying Channels**

**Ghost Counters.** We will now use our proof system to prove deadlock-freedom of the program in Fig. 1. To do so, however, we will introduce a *ghost resource* that plays the role of *credits*, in such a way that we can prove the invariant *Wt*(ch) + Ct(ch) - Ot(ch) + sizeof(ch). In particular, we want this property to follow from the lock invariant. This means we need to be able to talk, in the lock invariant, about the total number of credits in the system. To achieve this, we introduce a notion of *ghost counters* and corresponding *ghost counter tickets*, both of which are a particular kind of ghost resources. Specifically, we introduce three ghost commands: g newctr, g inc, and g dec. g newctr allocates a new ghost counter whose *value* is zero and returns a *ghost counter identifier* c for it. g inc(c) increments the value of the ghost counter with identifier c and produces a *ticket* for the counter. g dec(c), finally, consumes a ticket for ghost NewCounter {true} <sup>g</sup> newctr {λc. ctr(c, 0)}

IncCounter {ctr(c, n)} <sup>g</sup> inc(c) {<sup>λ</sup> . ctr(c, n+1) <sup>∗</sup> tic(c)}

DecCounter {ctr(c, n) <sup>∗</sup> tic(c)} <sup>g</sup> dec(c) {<sup>λ</sup> . ctr(c, n−1) <sup>∧</sup> <sup>0</sup><n}

**Fig. 6.** Ghost counters

counter c and decrements the ghost counter's value. Since these are the only operations that manipulate ghost counters or ghost counter tickets, it follows that the value of a ghost counter c is always equal to the number of tickets for c in the system. Proof rules for these ghost commands are shown in Fig. 6<sup>3</sup>.

**The Channels Proof.** Figure 7 illustrates how the program in Fig. 1 can be verified using our proof system. The invariant of lock ch.l in this program, denoted by inv(ch), is parametrized over bags *Wt*, Ot and implies the desired invariant one ob(ch.v,*Wt*, Ot). The permission ctr(ch.c, Ctv) in this invariant indicates that the total number of credits (tickets) for ch.v is Ctv, where ch.c is a *ghost field* added to the channel data structure, aiming to store a ghost counter identifier for the ghost counter of ch.v. Generally, a lock invariant can imply the invariant one ob(v,*Wt*, Ot) if it asserts *Wt*(v) + Ct(v) - Ot(v) + S(v) and *Wt*(v) - Ot(v), where Ct(v) is the total number of credits for v and S(v) is an integer value such that the command wait(v,l) is executed only if S(v) - 0. After initializing l in the main routine, there exists a credit for ch.v (denoted by tic(ch.c)) that is consumed by the thread executing the receive routine, and also an obligation for ch.v that is fulfilled by this thread after executing the send routine. The credit tic(ch.c) in the precondition of the routine receive ensures that before execution of the command wait(ch.v, ch.l), Ot(ch.v) > 0. This inequality follows from the invariant of lock <sup>l</sup>, which holds for *Wt*{[ch.v}] and Ot when Ctv is decreased by g dec(ch.c). This credit (or the one specified by M(ch.v) that is moved from a notifier thread when the receiver thread wakes up) must be consumed after execution of the command dequeue(ch.q) and before releasing ch.l to make sure that the invariant still holds after decreasing the number of items in ch.q. The obligation for ch.v in the precondition of the routine send is discharged by this routine, which is safe, since after the execution of the commands enqueue and notify the invariant one ob(ch.v,*Wt*, Ot − {[ch.v}] ), which follows from the lock invariant, holds.

<sup>3</sup> Some logics for program verification, such as Iris [19], include general support for defining ghost resources such as our ghost counters. In particular, our ghost counters can be obtained in Iris as an instance of the *authoritative monoid* [19, p. 5].

inv(channel ch) ::= λ*Wt*. λOt. ∃Ctv. ctr(ch.c, Ctv) ∗ ∃s. queue(ch.q, s) ∧ L(ch.v)=ch.l ∧ M(ch.v)=tic(ch.c) ∧ *Wt*(ch.v) + Ctv - Ot(ch.v) + s ∧ *Wt*(ch.v) -Ot(ch.v)

```
routine main(){{obs({[]})}
q:=newqueue; l:=newlock; v:=newcond; c:=g newctr; g inc(c);
{obs({[]}) ∗ ulock(l, {[]}, {[]}) ∗ queue(q, 0) ∗ ctr(c, 1) ∗ tic(c)
∧ L(v)=l ∧ M(v)=tic(c) ∧ R(l)=0 ∧ R(v)=1}
ch:=channel(q, l, v); ch.c:=c;
{obs({[]}) ∗ ulock(l, {[]}, {[]}) ∗ inv(ch)({[]}, {[v}] ) ∗ tic(c)} g chrg(v);
{obs({[v}] ) ∗ ulock(l, {[]}, {[v}] ) ∗ inv(ch)({[]}, {[v}] ) ∗ tic(c)} g initl(l);
{obs({[v}] ) ∗ lock(l) ∗ tic(c) ∧ I(l)=inv(ch)}
fork (receive(ch));
```

```
{obs({[v}] ) ∗ lock(l)}
```

```
send(ch, 12) {obs({[]})}}
```

```
routine receive(channel ch){
```
{obs(O) ∗ tic(ch.c) ∗ lock(ch.l) ∧ ch.l≺O ∧ ch.v≺O ∧ I(ch.l)=inv(ch)} acquire(ch.l); {obs(O{[ch.l}] ) ∗ tic(ch.c) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(*Wt*, Ot)} while(sizeof(ch.q) = 0){ g dec(ch.c); {obs(O{[ch.l}] ) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(*Wt*{[ch.v}] , Ot)}} wait(ch.v, ch.l) {obs(O{[ch.l}] ) ∗ M(ch.v) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(*Wt*, Ot)}}; dequeue(ch.q); g dec(ch.c); {obs(O{[ch.l}] ) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(*Wt*, Ot)} release(ch.l) {obs(O) ∗ lock(ch.l)}}

**routine** send(channel ch, int d){ {obs(O{[ch.v}] ) ∗ lock(ch.l) ∧ ch.l≺O{[ch.v} ∧] I(ch.l)=inv(ch)} acquire(ch.l); {obs(O{[ch.v, ch.l}] ) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(*Wt*, Ot)} enqueue(ch.q, d); if (*Wt*(ch.v)>0) g inc(ch.c); notify(ch.v); {obs(O{[ch.v, ch.l}] ) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(W t, Ot−{[ch.v}] )} g disch(ch.v); {obs(O{[ch.l}] ) ∗ ∃*Wt*, Ot. locked(ch.l, *Wt*, Ot) ∗ inv(ch)(*Wt*, Ot)} release(ch.l) {obs(O) ∗ lock(ch.l)}}

**Fig. 7.** Verification of the program in Fig. 1

### **3.6 Other Examples**

Using the proof system of this section we prove two other deadlock-free programs, namely *sleeping barber* [16], and *barrier*. In the barrier program shown in Fig. 8, a barrier b consists of an integer variable r indicating the number of the remaining

```
routine main(){
r:=newint(3);
l:=newlock;
v:=newcond;
b:=barrier(r, l, v);
fork (task1(); wait for rest(b); task2());
fork (task1(); wait for rest(b); task2());
task1(); wait for rest(b); task2()}
                                                     if(b.r=0)
                                                     else
```

```
routine wait for rest(barrier b){
acquire(b.l);
b.r:=b.r−1;
  notifyAll();
  while(b.r>0)
     wait(b.v, b.l);
release(b.l)}
```

```
inv(barrier b) ::= λWt. λOt. ∃r0. b.r-
                                    →r ∧ L(b.v)=b.l ∧ M(b.v)=true ∧
    (Wt(b.v)=0 ∨ 0 < r) ∧ (r -
                                 Ot(b.v))
```

```
routine main(){{obs({[]})}
r:=newint(3); l:=newlock; v:=newcond;
{obs({[]}) ∗ r-
              →3 ∗ ulock(l, {[]}, {[]}) ∧ L(v)=l ∧ M(v)=true ∧ R(l)=0 ∧ R(v)=1}
b:=barrier(r, l, v);
{obs({[]}) ∗ inv(b)({[]}, {[3·v}] ) ∗ ulock(l, {[]}, {[]})}
g chrg(v); g chrg(v); g chrg(v); g initl(l);
{obs({[3·v}] ) ∗ lock(l) ∧ I(l)=inv(b)}
fork (wait for rest(b));
{obs({[2·v}] ) ∗ lock(l)}
fork (wait for rest(b));
{obs({[v}] ) ∗ lock(l)}
wait for rest(b) {obs({[]})}}
```

```
routine wait for rest(barrier b){
```
{obs(O{[b.v}] ) ∗ lock(b.l) ∧ b.l≺O{[b.v} ∧] b.v≺O ∧ I(b.l)=inv(b)} acquire(b.l); {obs(O{[b.v, b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*, Ot) ∗ inv(b)(*Wt*, Ot)} b.r:=b.r−1; if(b.r=0){ notifyAll(b.v); {obs(O{[b.v, b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*[b.v:=0], Ot) ∗inv(b)(*Wt*[b.v:=0], Ot−{[b.v}] )} g disch(b.v) {obs(O{[b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*, Ot) ∗ inv(b)(*Wt*, Ot)}} else{ {obs(O{[b.v, b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*, Ot) ∗inv(b)(*Wt*, Ot−{[b.v}] )} g disch(b.v); {obs(O{[b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*, Ot) ∗ inv(b)(*Wt*, Ot)} while(b.r>0) {obs(O{[b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*, Ot) ∗ inv(b)(*Wt*{[b.v}] , Ot)} wait(b.v, b.l) {obs(O{[b.l}] ) ∗ ∃*Wt*, Ot. locked(b.l, *Wt*, Ot) ∗ inv(b)(*Wt*, Ot)}}; release(b.l) {obs(O) ∗ lock(b.l)}}

threads that must call the routine wait for rest, a lock l protecting r against data races, and a condition variable v. Each thread executing the routine wait for rest first decreases the variable r, and if the resulting value is still positive waits for v, otherwise it notifies all threads waiting for v. In this program the barrier is initialized to 3, implying that no thread must start task<sup>2</sup> unless all the three threads in this program finish task1. This program is deadlock-free because the routine wait for rest is executed by three different threads. Figure 8 illustrates how this program can be verified by the presented proof rules. Note that before executing g disch in the else branch, safe obs holds because at this point we have <sup>0</sup> < b.r, which implies 1 < b.r before the execution of b.r := b.r <sup>−</sup> 1, and by the invariant we have 1 < Ot(b.v), implying 0 <sup>&</sup>lt; (Ot − {[b.v}] )(b.v). The interesting point about the verification of this program is that since all the threads waiting for condition variable v in this program are notified by the command notifyAll, the invariant of the related lock, implying one ob(b.v,*Wt*, Ot), is significantly different from the ones defined in the channel and sleeping barber examples. Generally, for a condition variable v on which only notifyAll is executed (and not notify) a lock invariant can imply the invariant one ob(v,*Wt*, Ot) if it asserts *Wt*(v)=0 <sup>∨</sup> <sup>S</sup>(v) - Ct(v) and Ct(v) < Ot(v) + S(v), where Ct(v) is the total number of credits for v and S(v) is an integer value such that the command wait(v,l) is executed only if S(v) - 0. For this particular example <sup>S</sup>(b.v)=1−b.r and Ct(b.v) = 0, since this program can be verified without incorporating the notion of credits.

### **4 Relaxing the Precedence Relation**

The precedence relation, in this paper denoted by ≺, introduced in [4] makes sure that all threads wait for the waitable objects in strict ascending order (with respect to the wait level associated with each waitable object), or here in this paper in descending order, ensuring that in any state of the execution there is no cycle in the corresponding wait-for graph. However, this relation is too restrictive and prevents verifying some programs that are actually deadlock-free, such as the one shown in the left side of Fig. 9. In this program a value is increased by two threads communicating through a channel. Each thread receives a value from the channel, increases that value, and then sends it back on the channel. Since an initial value is sent on the related channel this program is deadlock-free. The first attempt to verify this program is illustrated in the middle part of Fig. 9, where the required credit to verify the receive command in the routine inc is going to be provided by the send command, executed immediately after this command, and not by the precondition of this routine. In other words, the idea is to load a credit and an obligation for ch in the routine inc itself, and then spend the loaded credit to verify the receive(ch) command and fulfil the loaded obligation by the send(ch) command. However, this idea fails because the receive command in the routine inc cannot be verified since one of its preconditions, ch≺{[ch}] , never holds. Kobayashi [6,20] has addressed this problem in his type system by using the notion of *usages* and assigning levels to each *obligation/capability*, instead of


**Fig. 9.** A deadlock-free program verified by exploiting the relaxed precedence relation

waitable objects. However, in the next section we provide a novel idea to address this problem by just relaxing the precedence relation used in the presented proof rules.

### **4.1 A Relaxed Precedence Relation**

To tackle the problem mentioned in the previous section we relax the precedence relation, enforced by ≺, by replacing ≺ by satisfying the following property: <sup>o</sup><sup>O</sup> holds if either <sup>o</sup>≺<sup>O</sup> or (1) <sup>o</sup>≺<sup>O</sup> − {[o}] , and (2) <sup>o</sup> satisfies the property that in any execution state, if a thread waits for o then there exists a thread that can discharge an obligation for o and is not waiting for any object whose wait level is equal to/greater than the wait level of o. This property still guarantees that in any state of the execution if the program has some threads suspended, waiting for some obligations, there is always a thread obliged to fulfil the obligation omin that is not blocked, where omin has a minimal wait level among all waitable objects for which a thread is waiting.

The condition number 2 is met if it is an invariant that for a condition variable o for which a thread is waiting the total number of obligations is greater than the total number of waiting threads. Since each thread waiting for o has at most one instance of o in the bag of its obligations, according to the *pigeonhole principle*, if the number obligations for o is higher than the number of threads waiting for o then there exists a thread that holds an obligation for o that is not waiting for o, implying the rule number 2 because this thread only waits for objects whose wait levels are lower than the wait level of o. Accordingly, we first introduce a new function P in the proof rules mapping each waitable object to a boolean value, and then make sure that for any object o for which a thread is waiting if P(o) = true then *Wt*(o) < Ot(o). With the help of this function we define the relaxed precedence relation as shown in Definition 1.

**Definition 1 (Relaxed precedence relation).** *The relaxed precedence relation indexed over functions* R *and* P *holds for a waitable object* v *and a bag of obligations* O*, denoted by* v O*, if and only if:*

$$v \prec O \lor \left(v \prec O - \{v\} \land \mathsf{P}(v) = \mathsf{true}\right), \text{ where } v \prec O \Leftrightarrow \forall o \in O. \text{ } \mathsf{R}(v) < \mathsf{R}(o)$$

Using this relaxed precedence relation the approach presented by Leino *et al.* [4] can also support more complex programs, such as the one in the left side of Fig. 9. This approach can exploit this relation by (1) replacing the original precedence relation ≺ by the relaxed one , and (2) replacing the rule associated with creating a channel by the one shown below. According to this proof rule for each channel ch the function P, in the definition of the relaxed precedence relation, is initialized when ch is created such that if P(ch) is decided to be true then one obligation for ch is loaded onto the bag of obligations of the creating thread. The approach is still sound because for any channel ch for which P is true the invariant *Wt*(ch) + Ct(ch) < Ot(ch) +sizeof(ch) holds. Combined with the fact that in this language, where channels are primitive constructs, *Wt*(ch) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> sizeof(ch) = 0, we have *Wt*(ch) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> *Wt*(ch) < Ot(ch). Now consider a deadlocked state, where each thread is waiting for a waitable object. Among all of these waitable objects take the one having a minimal wait level, namely om. If o<sup>m</sup> is a lock or a channel, where P(om) = false, then at least one thread has an obligation for o<sup>m</sup> and is waiting for an object o whose wait level is lower that the wait level of om, which contradicts minimality of the wait level of om. Otherwise, since*Wt*(om) > 0 we have*Wt*(om) < Ot(om). Additionally, we know that each thread waiting for o<sup>m</sup> has at most one obligation for om. Accordingly, there must be a thread holding an obligation for o<sup>m</sup> that is not waiting for om. Consequently, this thread must be waiting for an object o whose wait level is lower than the wait level of om, which contradicts minimality of the wait level of om.

```
{obs(O)} newchannel {λch. obs(O
                                 ) ∧ R(ch) = z ∧ P(ch) = b
   ∧((b = false ∧ O = O) ∨ (b = true ∧ O = O{[ch}] ))}
```
To exploit the relaxed definition in the approach presented in this paper we only need to make sure that for any condition variable v for which a thread is waiting if P(v) is true then Ot(v) is greater than *Wt*(v). To achieve this goal we include this invariant in the definition of the invariant safe obs, shown in Definition 2, an invariant that must hold when a command wait or a ghost command g disch is executed.

**Definition 2 (Safe Obligations).** *The relation* safe obs(v,*Wt*, Ot)*, indexed over function* P*, holds if and only if:*

one ob(v,*Wt*, Ot) <sup>∧</sup> (P(v) = true <sup>⇒</sup> spare ob(v,*Wt*, Ot))*, where* one ob(v,*Wt*, Ot) <sup>⇔</sup> (*Wt*(v) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> Ot(v) <sup>&</sup>gt; 0) spare ob(v,*Wt*, Ot) <sup>⇔</sup> (*Wt*(v) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> *Wt*(v) < Ot(v))

one ob(*v,Wt, Ot*) ∧ (P(*v*)=true ⇒ spare ob(*v,Wt, Ot*))*, where* one ob(*v,Wt, Ot*) ⇔ (*Wt*(*v*)*>*0 ⇒ *Ot*(*v*)*>*0) spare ob(*v,Wt, Ot*) ⇔ (*Wt*(*v*)*>*0 ⇒ *Wt*(*v*)*<Ot*(*v*))


**Fig. 10.** A readers-writers program with variables aw, holding the number of threads writing, ww, holding the number of thread waiting to write, and ar, holding the number of threads reading, that is synchronized using a monitor consisting of condition variables v*w*, preventing writers from writing while other threads are reading or writing, and v*r*, preventing readers from reading while there is another thread writing or waiting to write.

**Readers-Writes Locks.** As another application of this relaxed definition consider a readers-writers program, shown in Fig. 10<sup>4</sup>, where the condition variable v<sup>w</sup> prevents writers from writing on a shared memory when that memory is being accessed by other threads. After reading the shared memory, a reader thread notifies this condition variable if there is no other thread reading that memory. This condition variable is also notified by a writer thread when it finishes its writing. Consequently, a writer thread first might wait for v<sup>w</sup> and then fulfil an obligation for this condition variable. This program is verified if the writer thread itself produces a credit and an obligation for v<sup>w</sup> and then uses the former for the command wait(vw, l) and fulfils the latter at the end of its execution. Accordingly, since when the command wait(vw, l) is executed v<sup>w</sup> is in the bag of obligations of the

<sup>4</sup> The abort commands in this program can be eliminated using the ghost counters from Fig. 6. However, we leave them in for simplicity.

```
inv(rdwr b) ::= λWt. λOt. ∃Ctw. ctr(b.cw, Ctw) ∗
∃aw0, ww0, ar0. b.aw-
                         →aw ∗ b.ww-
                                    →ww ∗ b.ar-
                                                →ar ∧
L(b.vw)=L(b.vr)=b.l ∧ M(b.vw)=tic(b.cw) ∧ M(b.vr)=true ∧ P(vw)=true ∧ P(vr)=false ∧
(Wt(b.vr)=0 ∨ 0 < aw + ww) ∧
aw + ww -
          Ot(b.vr) ∧
Wt(b.vw) + Ctw + aw + ar -
                           Ot(b.vw) ∧
(Wt(b.vw)=0 ∨ Wt(b.vw) < Ot(b.vw))
```

```
routine main(){
aw:=newint(0); ww:=newint(0);
ar:=newint(0); l:=newlock;
vw:=newcond; vr:=newcond;
b := rdwr(aw, ww, ar, l, vw, vr);
b.cw:=g newctr;
{obs({[]}) ∗ inv(b)({[]}, {[]}) ∗ ulock(l, {[]}, {[]}) ∗
L(vw)=L(vr)=l ∧ M(vw)=tic(b.cw) ∧
M(vr)=true ∧ R(l)=0 ∧ R(vw)=1 ∧
R(vr)=2 ∧ L(vw)=l ∧ L(vr)=l
∧ P(vw)=true ∧ P(vr)=false} g initl(l);
{obs({[]}) ∗ lock(l) ∧ I(l)=inv(b)}
fork( {obs({[]}) ∗ lock(l)}
 while (true) fork(reader(b)));
{obs({[]}) ∗ lock(l)}
while (true) fork(writer(b))
{obs({[]}) ∗ lock(l)}}
```

```
routine reader(rdwr b){
{obs(O) ∗ lock(b.l) ∧ b.lO{[b.vw}]
∧ b.vrO ∧ I(b.l)=inv(b)}
acquire(b.l);
while(b.aw+b.ww>0)
  wait(b.vr, b.l);
b.ar:=b.ar+1;
g chrg(b.vw);
release(b.l);
// Perform reading ...
acquire(b.l);
if(b.ar<1)
  abort;
b.ar:=b.ar−1;
if (Wt(b.vw) > 0) g inc(b.cw);
notify(b.vw);
g disch(b.vw);
release(b.l) {obs({[]}) ∗ lock(b.l)}}
```

```
routine writer(rdwr b){
{obs(O) ∗ lock(b.l) ∧ b.lO{[b.vw, b.vr}]
∧ b.vwO{[b.vw, b.vr} ∧] I(b.l)=inv(b)}
acquire(b.l);
g chrg(b.vw); g inc(b.cw);
g chrg(b.vr);
while(b.aw+b.ar>0){
  g dec(b.cw);
  b.ww:=b.ww+1;
  wait(b.vw, b.l);
  if(b.ww<1)
    abort();
  b.ww:=b.ww−1
};
b.aw:=b.aw+1;
g dec(b.cw);
release(b.l);
// Perform writing ...
acquire(b.l);
if(b.aw=1)
  abort;
b.aw:=b.aw−1;
if (Wt(b.vw) > 0) g inc(b.cw);
notify(b.vw);
if(b.ww=0)
  notifyAll(b.vr);
g disch(b.vw); g disch(b.vr);
release(b.l) {obs({[]}) ∗ lock(b.l)}}
```
writer thread, this command can be verified if <sup>v</sup>w{[vw}] , where <sup>P</sup>(vw) must be true. The verification of this program is illustrated in Fig. 11. Generally, for a condition variable v for which P(v) = true a lock invariant can imply the invariant one ob(v,*Wt*, Ot) if it asserts *Wt*(v) + Ct(v) < Ot(v) + S(v) and *Wt*(v) = <sup>0</sup> <sup>∨</sup> *Wt*(v) < Ot(v), where Ct(v) is the total number of credits for <sup>v</sup> and <sup>S</sup>(v) is an integer value such that wait(v,l) is executed only if S(v) -0.

### **4.2 A Further Relaxation**

The relation allows one to verify some deadlock-free programs where a thread waits for a condition variable while that thread is also obliged to fulfil an obligation for that variable. However, it is still possible to have a more general, more relaxed definition for this relation. Under this definition a thread with obligations <sup>O</sup> is allowed to wait for a condition variable <sup>v</sup> if either <sup>v</sup>≺O, or there exists an obligation <sup>o</sup> such that (1) <sup>v</sup>≺<sup>O</sup> − {[o}] , and (2) <sup>o</sup> satisfies the property that in any execution state, if a thread is waiting for o then there exists a thread that is not waiting for any waitable object whose wait level is equal to/greater than the wait levels of v and o. This new definition still guarantees that in any state of the execution if the program has some threads suspended, waiting for some obligations, there is always a thread obliged to fulfil the obligation omin that is not suspended, where omin has a minimal wait level among all waitable objects for which a thread is waiting. To satisfy the condition number 2 we introduce a new definition for , shown in Definition 3, that uses a new function X mapping each lock to a set of wait levels. This definition will be sound only if the proof rules ensure that for any condition variable v whose wait level is in X(L(v)) the number of obligations is equal to or greater than the number of the waiting threads.

This definition is still sound because of Lemma 1, that has been machinechecked in Coq<sup>5</sup>, where G is a bag of waitable object-bag of obligations pairs such that each element t of G is associated with a thread in a state of the execution, where the first element of t is the object for which t is waiting and the second element is the bag of obligations of t. This lemma implies that if all the mentioned rules, denoted by H<sup>1</sup> to H4, are respected in any state of the execution then it is impossible that all threads in that state are waiting for a waitable object. This lemma can be proved by induction on the number of elements of G and considering the element waiting for an object whose wait level is minimal (see [16] representing its proof in details).

**Definition 3 (Relaxed precedence relation).** *The new precedence relation indexed over functions* R, L, P,X *holds for a waitable object* v *and a bag of obligations* O*, denoted by* v O*, if and only if:*

<sup>5</sup> The machine-checked proof can be found at https://github.com/jafarhamin/ deadlock-free-monitors-soundness.

$$\begin{array}{l} (v \rightsquigarrow O \lor \ v \rightsquigarrow O) \land (\neg \mathbf{ex}(v) \lor v \bot O), \; where\\ v \rightsquigarrow O \leftrightarrow \phi \bullet \mathbf{e} \in O. \; R(v) < \mathsf{R}(o) \\\ v \rightsquigarrow O \leftrightarrow \mathsf{P}(v) = \mathsf{true} \land \mathsf{exc}(v) \land\\ \exists o. \; v \rightsquigarrow O - \{o\} \land \mathsf{R}(v) \leqslant \mathsf{R}(o) + 1 \land \mathsf{L}(v) = \mathsf{L}(o) \land \mathsf{exc}(o) \\\ \mathsf{exc}(v) = \mathsf{R}(v) \in \mathsf{X}(\mathsf{L}(v)) \\\ v \bot O \leftrightarrow \mathsf{let} \; Ox = \lambda v'. \; \begin{cases} O(v') & if \; \mathsf{R}(v') \in \mathsf{X}(\mathsf{L}(v)) \\\ 0 & otherwise \end{cases} \\\ \end{array} \end{array}$$
 
$$\begin{array}{l} |Ox| \leqslant 1 \land \\ \forall v'. \; Ox(v') > 0 \Rightarrow \mathsf{L}(v') = \mathsf{L}(v) \end{array}$$

**Lemma 1 (A Valid Graph Is Not Deadlocked)** <sup>∀</sup> <sup>G</sup>:*Bags*(*WaitObjs* <sup>×</sup> *Bags*(*WaitObjs*)), R:*WaitObjs*→*WaitLevels*, <sup>L</sup>:*WaitObjs*→*Locks*, P:*WaitObjs*→*Bools*, X:*Locks*→*Sets*(*WaitLevels*). <sup>H</sup><sup>1</sup> <sup>∧</sup> <sup>H</sup><sup>2</sup> <sup>∧</sup> <sup>H</sup><sup>3</sup> <sup>∧</sup> <sup>H</sup><sup>4</sup> <sup>⇒</sup> <sup>G</sup> <sup>=</sup> {[]}, where <sup>H</sup><sup>1</sup> : <sup>∀</sup>(o, O) <sup>∈</sup> G. <sup>0</sup> <sup>&</sup>lt; Ot(o) <sup>H</sup><sup>2</sup> : <sup>∀</sup>(o, O) <sup>∈</sup> G. P(o) = true <sup>⇒</sup> Wt(o) <sup>&</sup>lt; Ot(o) <sup>H</sup><sup>3</sup> : <sup>∀</sup>(o, O) <sup>∈</sup> G. R(o) <sup>∈</sup> <sup>X</sup>(L(o)) <sup>⇒</sup> Wt(o) - Ot(o) <sup>H</sup><sup>4</sup> : <sup>∀</sup>(o, O) <sup>∈</sup> G. oR,L,P,X<sup>O</sup> where Wt <sup>=</sup> (o,O)∈<sup>G</sup> {[o}] and Ot <sup>=</sup> (o,O)∈<sup>G</sup> O

NewLock {true} newlock {λl. ulock(l, {[]}, {[]}) <sup>∧</sup> <sup>R</sup>(l)=<sup>z</sup> <sup>∧</sup> <sup>X</sup>(l)=X}

NewCv {true} newcond {λv. <sup>R</sup>(v)=<sup>z</sup> <sup>∧</sup> <sup>L</sup>(v)=<sup>l</sup> <sup>∧</sup> <sup>M</sup>(v)=<sup>m</sup> <sup>∧</sup> <sup>P</sup>(v)=b}

**Fig. 12.** New proof rules initializing functions X and P used in safe obs and

To

extend the proof rules with the new precedence relation it suffices to include a new invariant own ob in the definition of safe obs, as shown in Definition 4, an invariant that must hold when a command wait or a ghost command g disch is executed, to make sure that for any condition variable for which exc holds, the number of obligations is equal to/greater than the number of the waiting threads. Additionally, the functions X and P, as indicated in Fig. 12, are initialized when a lock and a condition variable is created, respectively. The rest of the proof rules are the same as those defined in Fig. 5 except that the old precedence relation (≺) is replaced by the new one ().

**Definition 4 (Safe Obligations).** *The relation* safe obs(v,*Wt*, Ot)*, indexed over functions* R, L, P,X*, holds if and only if:*

one ob(v,*Wt*, Ot) <sup>∧</sup> (P(v) = true <sup>⇒</sup> spare ob(v,*Wt*, Ot)) <sup>∧</sup> (exc(v) = true <sup>⇒</sup> own ob(v,*Wt*, Ot))*, where* one ob(v,*Wt*, Ot) <sup>⇔</sup> (*Wt*(v) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> Ot(v) <sup>&</sup>gt; 0) spare ob(v,*Wt*, Ot) <sup>⇔</sup> (*Wt*(v) <sup>&</sup>gt; <sup>0</sup> <sup>⇒</sup> *Wt*(v) < Ot(v)) own ob(v,*Wt*, Ot) <sup>⇔</sup> (*Wt*(v) -Ot(v))

**Bounded Channels.** One application of the new definition is a bounded channel program, shown in Fig. 13, where a sender thread waits for a receiver thread if the channel is full, synchronized by v<sup>f</sup> , and a receiver thread waits for a sender thread if the channel is empty, synchronized by ve. More precisely, the sender thread with an obligation for v<sup>e</sup> might execute the command wait(v<sup>f</sup> , l), and the receiver thread with an obligation for v<sup>f</sup> might execute a command wait(ve, l).

```
routine main(){
q := newqueue;
l := newlock;
vf := newcvar;
ve := newcvar;
ch:=channel(q, l, vf , ve);
fork (receive(ch));
send(ch, 12)}
                           routine send(channel ch, int d)
                           {
                            acquire(ch.l);
                            while(sizeof(ch.q) = max)
                              wait(ch.vf , ch.l);
                            enqueue(ch.q, d);
                            notify(ch.ve);
                            release(ch.l)}
                                                              routine receive(channel ch)
                                                              {
                                                               acquire(ch.l);
                                                               while(sizeof(ch.q) = 0)
                                                                 wait(ch.ve, ch.l);
                                                               dequeue(ch.q);
                                                               notify(ch.vf );
                                                               release(ch.l)}
inv(channel ch) ::= λWt. λOt. ∃Cte, Ctf. ctr(ch.ce, Cte) ∗ ctr(ch.cf , Ctf) ∗
 ∃s. queue(ch.q, s) ∧ P(ve)=false ∧ M(ve)=tic(ch.ce) ∧ M(vf )=tic(ch.cf ) land
 L(ch.ve)=L(ch.vf )=ch.l ∧
 Wt(ch.ve) + Cte -
                     Ot(ch.ve) + s ∧ Wt(ch.ve) -
                                                    Ot(ch.ve) ∧
 Wt(ch.vf ) + Ctf + s < Ot(ch.vf ) + max ∧ (Wt(vf )=0 ∨ Wt(ch.vf ) < Ot(ch.vf ))
routine main(){
q := newqueue;
l := newlock;
vf := newcvar;
ve := newcvar;
ch:=channel(q, l, vf , ve);
ch.ce:=g newctr;
ch.cf :=g newctr;
g inc(ch.ce);
g inc(ch.cf );
g chrg(ve); g chrg(vf );
g initl(l);
{obs({[ve, vf}] ) ∗ lock(l) ∗
tic(ch.ce) ∗ tic(ch.cf ) ∗
L(vf )=l ∧ L(ve)=l ∧
M(ve)=tic(ch.ce) ∧
M(vf )=tic(ch.cf ) ∧
P(vf )=true ∧
P(ve)=false ∧
R(l)=0 ∧
R(ve)=1 ∧ R(vf )=2 ∧
X(l)={1, 2} ∧ I(l)=inv}
fork (receive(ch));
send(ch, 12) {obs({[]})}}
                           routine send(channel ch, int d)
                           {{obs(O{[ch.ve}] ) ∗ tic(ch.cf ) ∗
                           lock(ch.l) ∧ ch.lO{[ch.ve} ∧]
                           ch.vfO{[ch.ve}∧] I(ch.l)=inv}
                           acquire(ch.l);
                           while(sizeof(ch.q) = max){
                             g dec(ch.cf );
                             wait(ch.vf , ch.l)};
                           enqueue(ch.q, d);
                           if (Wt(b.ve) > 0)
                             g inc(b.ce);
                           notify(ch.ve);
                           g disch(ch.ve);
                           g dec(ch.cf );
                           release(ch.l)
                           {obs(O) ∗ lock(ch.l)}}
                                                              routine receive(channel ch){
                                                              {obs(O{[ch.vf}] ) ∗ tic(ch.ce) ∗
                                                               lock(ch.l) ∧ ch.lO{[ch.vf} ∧]
                                                               ch.veO{[ch.vf}∧] I(ch.l)=inv}
                                                               acquire(ch.l);
                                                               while(sizeof(ch.q) = 0){
                                                                g dec(ch.ce);
                                                                wait(ch.ve, ch.l)};
                                                               dequeue(ch.q);
                                                               if (Wt(b.vf ) > 0)
                                                                g inc(b.cf );
                                                               notify(ch.vf );
                                                               g disch(ch.vf );
                                                               g dec(ch.ce);
                                                               release(ch.l)
                                                               {obs(O) ∗ lock(ch.l)}}
```
**Fig. 13.** Verification of a bounded channel synchronized using a monitor consisting of condition variables v*<sup>f</sup>* , preventing sending on a full channel, and v*e*, preventing taking messages from an empty channel

Since v<sup>e</sup> and v<sup>f</sup> are not equal, it is impossible to verify this program by the old definition of because the waiting levels of v<sup>e</sup> and v<sup>f</sup> cannot be lower than each other. Thanks to the new definition of , this program can be verified, as shown in Fig. 13, by initializing <sup>P</sup>(v<sup>f</sup> ) with true and <sup>X</sup>(l) with {1, <sup>2</sup>}, where two consecutive numbers 1 and 2 are the wait levels of v<sup>e</sup> and v<sup>f</sup> , respectively.

### **5 Soundness Proof**

In this section we provide a soundness proof for the present approach<sup>6</sup>, i.e. if a program is verified by the proposed proof rules, where the verification starts from an empty bag of obligations and also ends with such bag, this program is deadlock-free. To this end, we first define the syntax of programs and a small-step semantics for programs () relating two *configurations* (see [16] for formal definitions). A configuration is a thread table-heap pair (t, h), where heaps and thread tables are some partial functions from locations and thread identifiers to integers and command-*context* pairs (c; ξ), respectively, where a context, denoted by ξ, is either done or let x:=[] in c; ξ. Then we define *validity of configurations*, shown in Definition 5, and prove that (1) if a program c is verified by the proposed proof rules, where it starts from the precondition obs({[]}) and satisfies the post condition <sup>λ</sup> .obs({[]}), then the initial configuration, where the heap is empty, denoted by **0** = λ .∅, and there is only one thread with command c and context done, is a valid configuration (Theorem 4), (2) a valid configuration is not deadlocked (Theorem 5), and (3) starting from a valid configuration, all the subsequent configurations of the execution are also valid (Theorem 6).

In a valid configuration (t, h), h contains all the heap ownerships that are in possession of all threads in t and also those that are in possession of the locks that are not held, specified by a list A. Additionally, each thread must have all the required permissions to be successfully verified with no remaining obligation, enforced by wpcx. wpcx(c, ξ) in this definition is a function returning the weakest precondition of the command <sup>c</sup> with the context <sup>ξ</sup> w.r.t. the postcondition <sup>λ</sup> .obs({[]}) (see [16] for formal definitions). This function is defined with the help of a function wp(c, a) returning the weakest precondition the command c w.r.t. the postcondition a.

**Definition 5 (Validity of Configurations).** *A configuration is valid, denoted by* valid(t, h)*, if there exist a list of* augmented threads T*, consisting of an identifier (*id*), a program (*c*), a context (*ξ*), a permission heap (*p*), a ghost resource heap (*C*) and a bag of obligations (*O*) associated with each thread; a list of assertions* A*, and some functions* R, I, L, M, P, X *such that:*

*–* <sup>∀</sup>id, c, ξ. t(id)=(c; <sup>ξ</sup>) ⇔ ∃p, O, C. (id, c, ξ, p, O, C) <sup>∈</sup> <sup>T</sup> *–* <sup>h</sup> <sup>=</sup> pheap2heap( <sup>∗</sup> <sup>a</sup>∈<sup>A</sup> <sup>a</sup> ∗ ∗ (id,c,ξ,p,O,C)∈<sup>T</sup> p)

<sup>6</sup> The machine-checked version of some lemmas and theorems in this proof, such as Theorems 4 and 5, can be found at https://github.com/jafarhamin/deadlock-freemonitors-soundness.

*–* <sup>∀</sup>(id, c, ξ, p, O, C) <sup>∈</sup> T.


*where*


We finally prove that for each proof rule {a} <sup>c</sup> {a } we have <sup>a</sup> <sup>⇒</sup> wp(c, a ). To this end, we first define *correctness of commands*, shown in Definition 6, and then for each proof rule {a} <sup>c</sup> {a } we prove correct(a, c, a ). In addition to the proof rules presented in this paper, other useful rules such as the rules *consequence*, *frame* and *sequential*, shown in Theorems 1, 2, and 3 can also be proved with the help of some auxiliary lemmas in [16]. Note that the indexes R, I, L, M, P, X are omitted when they are unimportant.

### **Definition 6 (Correctness of Commands)**

$$\mathsf{correct}\_{R,I,L,M,P,X}(a,c,a') \Leftrightarrow (a \Rightarrow \mathsf{wp}\_{R,I,L,M,P,X}(c,a'))$$

**Theorem 1 (Rule Consequence)**

correct(a1, c, a2) <sup>∧</sup> (a <sup>1</sup> <sup>⇒</sup> <sup>a</sup>1) <sup>∧</sup> (∀z. a2(z) <sup>⇒</sup> <sup>a</sup> <sup>2</sup>(z)) <sup>⇒</sup> correct(a 1, c, a 2)

**Theorem 2 (Rule Frame)**

$$\mathsf{correct}(a, c, a') \Rightarrow \mathsf{correct}(a \ast f, c, \lambda z. \, a'(z) \ast f)$$

**Theorem 3 (Rule Sequential Composition)**

correct(a, c1, a ) <sup>∧</sup> (∀z. correct(a (z), c2[z/x], a)) <sup>⇒</sup> correct(a, let x:=c<sup>1</sup> in c2, a)

### **Theorem 4 (The Initial Configuration is Valid)**

correctR,I,L,M,P,X(obs({[]}), c, λ .obs({[]})) <sup>⇒</sup> valid(**0**[id:=c; done], **<sup>0</sup>**)

*Proof.* The goal is achieved because there are an augmented thread list T = [(id, c, done, **<sup>0</sup>**, {[]}, **<sup>0</sup>**)], a list of assertions <sup>A</sup> = [], and functions R, I, L, M, P, X by which all the conditions in the definition of validity of configurations are satisfied.

### **Theorem 5 (A Valid Configuration is Not Deadlocked)**

$$\begin{array}{l} (\exists id, c, \xi, o. \, t(id) = (c; \xi) \land \mathsf{wating\\_for}(c, h) = o) \land \mathsf{valid}(t, h) \\ \Rightarrow \exists id', c', \xi', \, t(id') = (c'; \xi') \land \mathsf{waiting\\_for}(c', h) = \mathcal{Q} \end{array}$$

*Proof.* We assume that all threads in t are waiting for an object. Since (t, h) is a valid configuration there exists a valid augmented thread table T with a corresponding valid graph G = g(T), where g maps any element such as (id, c, ξ, p, O, C) to a new one such as (waiting for(c), O). By Lemma 1, we have <sup>G</sup> <sup>=</sup> {[]}, implying <sup>T</sup> <sup>=</sup> {[]}, implying <sup>t</sup> <sup>=</sup> **<sup>0</sup>** which contradicts the assumption of the theorem.

### **Theorem 6 (Steps Preserve Validity of Configurations).**<sup>7</sup>

valid(κ) <sup>∧</sup> <sup>κ</sup> <sup>κ</sup> <sup>⇒</sup> valid(κ )

*Proof.* By case analysis of the small step relation (see [16] explaining the proof of some non-trivial cases).

### **6 Related Work**

Several approaches to verify termination [1,21], total correctness [3], and lock freedom [2] of concurrent programs have been proposed. These approaches are only applicable to non-blocking algorithms, where the suspension of one thread cannot lead to the suspension of other threads. Consequently, they cannot be used to verify deadlock-freedom of programs using condition variables, where the suspension of a notifying thread might lead a waiting thread to be infinitely blocked. In [22] a compositional approach to verify termination of multi-threaded programs is introduced, where *rely-guarantee reasoning* is used to reason about each thread individually while there are some assertions about other threads. In this approach a program is considered to be terminating if it does not have any infinite computations. As a consequence, it is not applicable to programs using condition variables because a waiting thread that is never notified cannot be considered as a terminating thread.

There are also some other approaches addressing some common synchronization bugs of programs in the presence of condition variables. In [8], for example, an approach to identify some potential problems of concurrent programs consisting waits and notifies commands is presented. However, it does not take the order of execution of theses commands into account. In other words, it might accept an undesired execution trace where the waiting thread is scheduled before the notifying thread, that might lead the waiting thread to be infinitely suspended. [9] uses Petri nets to identify some common problems in multithreaded programs such as data races, lost signals, and deadlocks. However the model introduced for condition variables in this approach only covers the communication of two threads and it is not clear how it deals with programs having

<sup>7</sup> The proof of this theorem has not been machine-checked with Coq yet.

more than two threads communicating through condition variables. Recently, [10] has introduced an approach ensuring that every thread synchronizing under a set of condition variables eventually exits the synchronization block if that thread eventually reaches that block. This approach succeeds in verifying one of the applications of condition variables, namely the buffer. However, since this approach is not modular and relies on a Petri net analysis tool to solve the termination problem, it suffers from a long verification time when the size of the state space is increased, such that the verification of a buffer application having 20 producer and 18 consumer threads, for example, takes more than two minutes.

Kobayashi [6,20] proposed a type system for deadlock-free processes, ensuring that a well-typed process that is annotated with a finite *capability level* is deadlock free. He extended channel types with the notion of *usages*, describing how often and in which order a channel is used for input and output. For example, usage of <sup>x</sup> in the process <sup>x</sup>?y|x!1|x!2, where ?, !, <sup>|</sup> represent an input action, an output action, and parallel composition receptively, is expressed by ?|!|!, which means that <sup>x</sup> is used once for input and twice for output possibly in parallel. Additionally, to avoid circular dependency each action α is associated with the levels of obligation o and capabilities c, denoted by α<sup>o</sup> <sup>c</sup> , such that (1) an obligation of level n must be fulfilled by using only capabilities of level less than n, and (2) for an action of capability level n, there must exist a co-action of obligation level less than or equal to n. Leino *et al.* [4] also proposed an approach to verify deadlock-freedom of channels and locks. In this approach each thread trying to receive a message from a channel must spend one credit for that channel, where a credit for a channel is obtained if a thread is obliged to fulfil an obligation for that channel. A thread can fulfil an obligation for a channel if either it sends a message on that channel or delegate that obligation to other thread. The same idea is also used to verify deadlock-freedom of semaphores [7], where acquiring (i.e. decreasing) a semaphore consumes one credit and releasing (i.e. increasing) that semaphore produces one credit for that semaphore. However, as it is acknowledged in [4], it is impossible to treat channels (and also semaphores) like condition variables; a wait cannot be treated like a receive and a notify cannot be treated like a send because a notification for a condition variable will be lost if no thread is waiting for that variable. We borrow many ideas, including the notion of obligations/credits(capabilities) and levels, from these works and also the one introduced in [11], where a corresponding separation logic based approach is presented to verify total correctness of programs in the presence of channels.

### **7 Conclusion**

It this article we introduced a modular approach to verify deadlock-freedom of monitors. We also introduced a relax, more general precedence relation to avoid cycles in the wait-for graph of programs, allowing a verification approach to verify a wider range of deadlock-free programs in the presence of monitors, channels and other synchronization mechanisms.

**Acknowledgements.** This work was funded through Flemish Research Fund grant G.0058.13 and KU Leuven Research Fund grant OT/13/065. We thank three anonymous reviewers and Prof. Aleksandar Nanevski for their careful reading of our manuscript and their many insightful comments and suggestions.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Fragment Abstraction for Concurrent Shape Analysis**

Parosh Aziz Abdulla, Bengt Jonsson, and Cong Quy Trinh(B)

Uppsala University, Uppsala, Sweden cong-quy.trinh@it.uu.se

**Abstract.** A major challenge in automated verification is to develop techniques that are able to reason about fine-grained concurrent algorithms that consist of an unbounded number of concurrent threads, which operate on an unbounded domain of data values, and use unbounded dynamically allocated memory. Existing automated techniques consider the case where shared data is organized into singly-linked lists. We present a novel shape analysis for automated verification of fine-grained concurrent algorithms that can handle heap structures which are more complex than just singly-linked lists, in particular skip lists and arrays of singly linked lists, while at the same time handling an unbounded number of concurrent threads, an unbounded domain of data values (including timestamps), and an unbounded shared heap. Our technique is based on a novel shape abstraction, which represents a set of heaps by a set of *fragments*. A fragment is an abstraction of a pair of heap cells that are connected by a pointer field. We have implemented our approach and applied it to automatically verify correctness, in the sense of linearizability, of most linearizable concurrent implementations of sets, stacks, and queues, which employ singly-linked lists, skip lists, or arrays of singlylinked lists with timestamps, which are known to us in the literature.

### **1 Introduction**

Concurrent algorithms with an unbounded number of threads that concurrently access a dynamically allocated shared state are of central importance in a large number of software systems. They provide efficient concurrent realizations of common interface abstractions, and are widely used in libraries, such as the Intel Threading Building Blocks or the java.util.concurrent package. They are notoriously difficult to get correct and verify, since they often employ finegrained synchronization and avoid locking when possible. A number of bugs in published algorithms have been reported [13,30]. Consequently, significant research efforts have been directed towards developing techniques to verify correctness of such algorithms. One widely-used correctness criterion is that of *linearizability*, meaning that each method invocation can be considered to occur atomically at some point between its call and return. Many of the developed verification techniques require significant *manual* effort for constructing correctness proofs (e.g., [25,41]), in some cases with the support of an interactive theorem prover (e.g., [11,35,40]). Development of automated verification techniques remains a difficult challenge.

A major challenge for the development of automated verification techniques is that such techniques must be able to reason about fine-grained concurrent algorithms that are infinite-state in many dimensions: they consist of an unbounded number of concurrent threads, which operate on an unbounded domain of data values, and use unbounded dynamically allocated memory. Perhaps the hardest of these challenges is that of handling dynamically allocated memory. Consequently, existing techniques that can automatically prove correctness of such fine-grained concurrent algorithms restrict attention to the case where heap structures represent shared data by singly-linked lists [1,3,18,36,42]. Furthermore, many of these techniques impose additional restrictions on the considered verification problem, such as bounding the number of accessing threads [4,43,45]. However, in many concurrent data structure implementations the heap represents more sophisticated structures, such as skiplists [16,22,38] and arrays of singly-linked lists [12]. There are no techniques that have been applied to automatically verify concurrent algorithms that operate on such data structures.

*Contributions.* In this paper, we present a technique for automatic verification of concurrent data structure implementations that operate on dynamically allocated heap structures which are more complex than just singly-linked lists. Our framework is the first that can automatically verify concurrent data structure implementations that employ singly linked lists, skiplists [16,22,38], as well as arrays of singly linked lists [12], at the same time as handling an unbounded number of concurrent threads, an unbounded domain of data values (including timestamps), and an unbounded shared heap.

Our technique is based on a novel shape abstraction, called *fragment abstraction*, which in a simple and uniform way is able to represent several different classes of unbounded heap structures. Its main idea is to represent a set of heap states by a set of *fragments*. A fragment represents two heap cells that are connected by a pointer field. For each of its cells, the fragment represents the contents of its non-pointer fields, together with information about how the cell can be reached from the program's global pointer variables. The latter information consists of both: (i) *local* information, saying which pointer variables point directly to them, and (ii) *global* information, saying how the cell can reach to and be reached from (by following chains of pointers) heap cells that are globally significant, typically since some global variable points to them. A set of fragments represents the set of heap states in which any two pointer-connected nodes is represented by some fragment in the set. Thus, a set of fragments describes the set of heaps that can be formed by "piecing together" fragments in the set. The combination of local and global information in fragments supports reasoning about the sequence of cells that can be accessed by threads that traverse the heap by following pointer fields in cells and pointer variables: the local information captures properties of the cell fields that can be accessed as a thread dereferences a pointer variable or a pointer field; the global information also captures whether certain significant accesses will at all be possible by following a sequence of pointer fields. This support for reasoning about patterns of cell accesses enables automated verification of reachability and other functional properties.

Fragment abstraction can (and should) be combined, in a natural way, with data abstractions for handling unbounded data domains and with thread abstractions for handling an unbounded number of threads. For the latter we adapt the successful thread-modular approach [5], which represents the local state of a single, but arbitrary thread, together with the part of the global state and heap that is accessible to that thread. Our combination of fragment abstraction, thread abstraction, and data abstraction results in a finite abstract domain, thereby guaranteeing termination of our analysis.

We have implemented our approach and applied it to automatically verify correctness, in the sense of linearizability, of a large number of concurrent data structure algorithms, described in a C-like language. More specifically, we have automatically verified linearizability of most linearizable concurrent implementations of sets, stacks, and queues, and priority queues, which employ singly-linked lists, skiplists, or arrays of timestamped singly-linked lists, which are known to us in the literature on concurrent data structures. For this verification, we specify linearizability using the simple and powerful technique of *observers* [1,7,9], which reduces the criterion of linearizability to a simple reachability property. To verify implementations of stacks and queues, the application of observers can be done completely automatically without any manual steps, whereas for implementations of sets, the verification relies on light-weight user annotation of how linearization points are placed in each method [3].

The fact that our fragment abstraction has been able to automatically verify all supplied concurrent algorithms, also those that employ skiplists or arrays of SLLs, indicates that the fragment abstraction is a simple mechanism for capturing both the local and global information about heap cells that is necessary for verifying correctness, in particular for concurrent algorithms where an unbounded number of threads interact via a shared heap.

*Outline.* In the next section, we illustrate our fragment abstraction on the verification of a skiplist-based concurrent set implementation. In Sect. 3 we introduce our model for programs, and of observers for specifying linearizability. In Sect. 4 we describe in more detail our fragment abstraction for skiplists; note that singly-linked lists can be handled as a simple special case of skiplists. In Sect. 5 we describe how fragment abstraction applies to arrays of singly-linked lists with timestamp fields. Our implementation and experiments are reported in Sect. 6, followed by conclusions in Sect. 7.

*Related Work.* A large number of techniques have been developed for representing heap structures in automated analysis, including, e.g., separation logic and various related graph formalisms [10,15,47], other logics [33], automata [23], or graph grammars [19]. Most works apply these to sequential programs.

Approaches for automated verification of concurrent algorithms are limited to the case of singly-linked lists [1,3,18,36,42]. Furthermore, many of these techniques impose additional restrictions on the considered verification problem, such as bounding the number of accessing threads [4,43,45].

In [1], concurrent programs operating on SLLs are analyzed using an adaptation of a transitive closure logic [6], combined with tracking of simple sortedness properties between data elements; the approach does not allow to represent patterns observed by threads when following sequences of pointers inside the heap, and so has not been applied to concurrent set implementations. In our recent work [3], we extended this approach to handle SLL implementations of concurrent sets by adapting a well-known abstraction of singly-linked lists [28] for concurrent programs. The resulting technique is specifically tailored for singlylinks. Our fragment abstraction is significantly simpler conceptually, and can therefore be adapted also for other classes of heap structures. The approach of [3] is the only one with a shape representation strong enough to verify concurrent set implementations based on sorted and non-sorted singly-linked lists having non-optimistic contains (or lookup) operations we consider, such as the lock-free sets of *HM* [22], *Harris* [17], or *Michael* [29], or unordered set of [48]. As shown in Sect. 6, our fragment abstraction can handle them as well as also algorithms employing skiplists and arrays of singly-linked lists.

There is no previous work on automated verification of skiplist-based concurrent algorithms. Verification of *sequential* algorithms have been addressed under restrictions, such as limiting the number of levels to two or three [2,23]. The work [34] generates verification conditions for statements in sequential skiplist implementations. All these works assume that skiplists have the well-formedness property that any higher-level lists is a sublist of any lower-level list, which is true for sequential skiplist algorithms, but false for several concurrent ones, such as [22,26].

Concurrent algorithms based on arrays of SLLs, and including timestamps, e.g., for verifying the algorithms in [12] have shown to be rather challenging. Only recently has the TS stack been verified by non-automated techniques [8] using a non-trivial extension of forward simulation, and the TS queue been verified manually by a new technique based on partial orders [24,37]. We have verified both these algorithms automatically using fragment abstraction.

Our fragment abstraction is related in spirit to other formalisms that abstract dynamic graph structures by defining some form of equivalence on its nodes (e.g., [23,33,46]). These have been applied to verify functional correctness finegrained concurrent algorithms for a limited number of SLL-based algorithms. Fragment abstraction's representation of both local and global information allows to extend the applicability of this class of techniques.

### **2 Overview**

In this section, we illustrate our technique on the verification of correctness, in the sense of linearizability, of a concurrent set data structure based on skiplists, namely the Lock-Free Concurrent Skiplist from [22, Sect. 14.4]. Skiplists provide expected logarithmic time search while avoiding some of the complications of tree structures. Informally, a skiplist consists of a collection of sorted linked lists, each of which is located at a *level*, ranging from 1 up to a maximum value. Each skiplist node has a key value and participates in the lists at levels 1 up to its *height*.

The skiplist has sentinel head and tail nodes with maximum heights and key values −∞ and +∞, respectively. The lowestlevel list (at level 1) constitutes an ordered list of all nodes in the skiplist. Higher-level lists are increasingly sparse sublists of the lowest-level list, and serve as shortcuts into lower-level lists. Figure 1 shows an example of a skiplist of height 3. It has head and tail nodes of height 3, two nodes of height 2, and one node of height 1.

**Fig. 1.** An example of skiplist

The algorithm has three main methods, namely add, contains and remove. The method add(x) adds x to the set and returns true iff x was not already in the set; remove(x) removes x from the set and returns true iff x was in the set; and contains(x) returns true iff x is in the set. All methods rely on a method find to search for a given key. In this section, we shortly describe the find and add methods. Figure <sup>2</sup> shows code for these two methods.

In the algorithm, each heap node has a key field, a height, an array of next pointers indexed from 1 up to its height, and an array of marked fields which are true if the node has been logically removed at the corresponding level. Removal of a node (at a certain level k) occurs in two steps: first the node is logically removed by setting its marked flag at level k to true, thereafter the node is physically removed by unlinking it from the level-k list. The algorithm must be able to update the next[k] pointer and marked[k] field together as one atomic operation; this is standardly implemented by encoding them in a single word. The head and tail nodes of the skiplist are pointed to by global pointer variables H and T, respectively. The find method traverses the list at decreasing levels using two local variables pred and curr, starting at the head and at the maximum level (lines 5–6). At each level k it sets curr to pred.next[k] (line 7). During the traversal, the pointer variable succ and boolean variable marked are atomically assigned the values of curr.next[k] and curr.marked[k], respectively (line 9, 14). After that, the method repeatedly removes marked nodes at the current level (lines 10 to 14). This is done by using a CompareAndSwap (CAS) command (line 11), which tests whether pred.next[k] and pred.marked[k] are equal to curr and false respectively. If this test succeeds, it replaces them with succ and false and returns true; otherwise, the CAS returns false. During the traversal at level k, pred and curr are advanced until pred points to a node with the largest key at level k which is smaller than x (lines 15–18). Thereafter, the resulting values of pred and curr are recorded into preds[k] and succs[k] (lines 19, 20), whereafter traversal continues one level below until it reaches the bottom level. Finally, the method returns true if the key value of curr is equal to x; otherwise, it returns false meaning that a node with key <sup>x</sup> is not found.

**Fig. 2.** Code for the find and add methods of the skiplist algorithm. (Color figure online)

The add method uses find to check whether a node with key x is already in the list. If so it returns false; otherwise, a new node is created with randomly chosen height h (line 7), and with next pointers at levels from 1 to h initialised to corresponding elements of succ (line 8 to 9). Thereafter, the new node is added into the list by linking it into the bottom-level list between the preds[1] and succs[1] pointers returned by find. This is achieved by using a CAS to make preds[1].next[1] point to the new node (line 13). If the CAS fails, the add method will restart from the beginning (line 3) by calling find again, etc. Otherwise, add proceeds with linking the new node into the list at increasingly higher levels (lines 16 to 22). For each higher level k, it makes preds[k].next[k] point to the new node if it is still valid (line 20); otherwise find is called again to recompute preds[k] and succs[k] on the remaining unlinked levels (line 22). Once all levels are linked, the method returns true.

To prepare for verification, we add a specification which expresses that the skiplist algorithm of Fig. 2 is a linearizable implementation of a set data structure, using the technique of *observers* [1,3,7,9]. For our skiplist algorithm, the user first instruments statements in each method that correspond to linearization points (LPs), so that their execution announces the corresponding atomic set operation. In Fig. 2, the LP of a successful add operation is at line 15 of the add method (denoted by a blue dot) when the CAS succeeds, whereas the LP of an unsuccessful add operation is at line 13 of the find method (denoted by a red dot). We must now verify that in any concurrent execution of a collection of method calls, the sequence of announced operations satisfies the semantics of the set data structure. This check is performed by an *observer*, which monitors the sequence of announced operations. The observer for the set data structure utilizes a register, which is initialized with a single, arbitrary key value. It checks that operations on this particular value follow set semantics, i.e., that successful add and remove operations on an element alternate and that contains are consistent with them. We form the cross-product of the program and the observer, synchronizing on operation announcements. This reduces the problem of checking linearizability to the problem of checking that in this cross-product, regardless of the initial observer register value, the observer cannot reach a state where the semantics of the set data structure has been violated.

To verify that the observer cannot reach a state where a violation is reported, we compute a symbolic representation of an invariant that is satisfied by all reachable configurations of the cross-product of a program and an observer. This symbolic representation combines thread abstraction, data abstraction and our novel *fragment abstraction* to represent the heap state. Our *thread abstraction* adapts the thread-modular approach by representing only the view of single, but arbitrary, thread th. Such a view consists of the local state of thread th, including the value of the program counter, the state of the observer, and the part of the heap that is accessible to thread th via pointer variables (local to th or global). Our *data abstraction* represents variables and cell fields that range over small finite domains by their concrete values, whereas variables and fields that range over the same domain as key fields are abstracted to constraints over their relative ordering (wrp. to <).

In our *fragment abstraction*, we represent the part of the heap that is accessible to thread th by a set of *fragments*. A fragment represents a pair of heap cells (accessible to th) that are connected by a pointer field, under the applied data abstraction. A fragment is a triple of form i, o, φ, where i and o are *tags* that represent the two cells, and <sup>φ</sup> is a subset of {<, <sup>=</sup>, >} which constrains the order between the key fields of the cells. Each tag is a tuple tag <sup>=</sup> dabs, pvars, reachfrom, reachto, private, where


Thus, the fragment contains both (i) *local* information about the cell's fields and variables that point to it, as well as (ii) *global* information, representing how each cell in the pair can reach to and be reached from (by following a chain of pointers) a small set of globally significant heap cells.

A set of fragments represents the set of heap structures in which each pair of pointer-connected nodes is represented by some fragment in the set. Put differently, a set of fragments describes the set of heaps that can be formed by "piecing together" pairs of pointer-connected nodes that are represented by some fragment in the set. This "piecing together" must be both locally consistent (appending only fragments that agree on their common node), and globally consistent (respecting the global reachabil-

**Fig. 3.** A structure of a cell

ity information). When applying fragment abstraction to skiplists, we use two types of fragments: *level 1-fragments* for nodes connected by a next[1]-pointer, and *higher level-fragments* for nodes connected by a higher level pointer. In other words, we abstract all levels higher than 2 by the abstract element higher. Thus, a pointer or non-pointer variable of form v[k], indexed by a level k <sup>≥</sup> 2, is abstracted to v[higher].

**Fig. 4.** A heap shape of a 3-level skiplist with two threads active

Let us illustrate how fragment abstraction applies to the skiplist algorithm. Figure 4 shows an example heap state of the skiplist algorithm with three levels. Each heap cell is shown with the values of its fields as described in Fig. 3. In addition, each cell is labeled by the pointer variables that point to it; we use preds(i)[k] to denote the local variable preds[k] of thread thi, and the same for other local variables. In the heap state of Fig. 4, thread th<sup>1</sup> is trying to add a new node of height 1 with key 9, and has reached line 8 of the add method. Thread th<sup>2</sup> is trying to add a new node with key 20 and it has done its first iteration of the for loop in the find method. The variables preds(2)[3] and currs(2)[3] have been assigned so that the new node (which has not yet been created) will be inserted between node 5 and the tail node. The observer is not shown, but the value of the observer register is 9; thus it currently tracks the add operation of th<sup>1</sup>.

Figure 5 illustrates how pairs of heap nodes can be represented by fragments. As a first example, in the view of thread th<sup>1</sup>, the two left-most cells in Fig. <sup>4</sup> are represented by the level 1-fragment <sup>v</sup><sup>1</sup> in Fig. 5. Here, the variable preds(1)[3] is represented by preds[higher]. The mapping <sup>π</sup><sup>1</sup> represents the data abstraction of the key field, here saying that it is smaller than the value 9 of the observer register. The two left-most cells are also represented by a higher-level fragment, viz. <sup>v</sup>8. The pair consisting of the two sentinel cells (with keys −∞ and +∞) is represented by the higher-level fragment v9. In each fragment, the abstraction dabs of non-pointer fields are shown represented inside each tag of the fragment. The <sup>φ</sup> is shown as a label on the arrow between two tags. Above each tag is pvars.

The first row under each tag is reachfrom, whereas the second row is reachto. Figure 5 shows a set of fragments that is sufficient to represent the part of the heap that is accessible to th<sup>1</sup> in the configuration in Fig. 4. There are 11 fragments, named <sup>v</sup><sup>1</sup>, ..., <sup>v</sup><sup>11</sup>. Two of these (v<sup>6</sup>, <sup>v</sup><sup>7</sup> and <sup>v</sup><sup>11</sup>) consist of a tag

**Fig. 5.** Fragment abstraction of skiplist algorithm

that points to ⊥. All other fragments consist of a pair of pointer-connected tags. The fragments <sup>v</sup><sup>1</sup>, ..., <sup>v</sup><sup>6</sup> are level-1-fragments, whereas <sup>v</sup><sup>7</sup>, ..., <sup>v</sup><sup>11</sup> are higher level-fragments. The private field of the input tag of <sup>v</sup><sup>7</sup> is true, whereas the private field of tags of other fragments are false.

To verify linearizability of the algorithm in Fig. 2, we must represent several key invariants of the heap. These include (among others):


Let us illustrate how such invariants are captured by our fragment abstraction. (1) All level-1 fragments are strictly sorted, implying that the bottom-level list is strictly sorted. (2) For each higher-level fragment v, if H <sup>∈</sup> v.i.reachfrom then also H <sup>∈</sup> v.o.reachfrom, implying (together with v.φ <sup>=</sup> {<}) that the cell represented by v.o it is reachable from that represented by v.i by a sequence of next[1]-pointers. (3) This is verified by inspecting each tag: <sup>v</sup><sup>3</sup> contains the only unreachable tag, and it is also marked. (4) The fragments express this property in the case where the value of key is the same as the value of the observer register x. Since the invariant holds for any value of x, this property is sufficiently represented for purposes of verification.

### **3 Concurrent Data Structure Implementations**

In this section, we introduce our representation of concurrent data structure implementations, we define the correctness criterion of linearizability, we introduce observers and how to use them for specifying linearizability.

#### **3.1 Concurrent Data Structure Implementations**

We first introduce (sequential) data structures. A *data structure* DS is a pair D,M, where <sup>D</sup> is a (possibly infinite) *data domain* and <sup>M</sup> is an alphabet of *method names*. An *operation op* is of the form m(d*in*, d*out*), where <sup>m</sup> <sup>∈</sup> <sup>M</sup> is a method name, and d*in*, d*out* are the *input* resp. *output* values, each of which is either in D or in some small finite domain F, which includes the booleans. For some method names, the input or output value is absent from the operation. A *trace* of DS is a sequence of operations. The (sequential) semantics of a data structure DS is given by a set [[DS]] of allowed traces. For example, a Set data structure has method names add, remove, and contains. An example of an allowed trace is add(3, true) contains(4, false) contains(3, true) remove(3, true).

A *concurrent data structure implementation* operates on a shared state consisting of shared global variables and a shared heap. It assigns, to each method name, a method which performs operations on the shared state. It also comes with a method named init, which initializes its shared state.

<sup>A</sup> *heap (state)* <sup>H</sup> consists of a finite set <sup>C</sup> of cells, including the two special cells null and <sup>⊥</sup> (dangling). Heap cells have a fixed set <sup>F</sup> of fields, namely non-pointer fields that assume values in D or F, and possibly lock fields. We use the term D*-field* for a non-pointer field that assumes values in D, and the terms F*-field* and *lock field* with analogous meaning. Furthermore, each cell has one or several named pointer fields. For instance, in data structure implementations based on singly-linked lists, each heap cell has a pointer field named next; in implementations based on skiplists there is an array of pointer fields named next[k] where k ranges from 1 to a maximum level.

Each method declares local variables and a method body. The set of local variables includes the input parameter of the method and the program counter pc. A *local state* loc of a thread th defines the values of its local variables. The global variables can be accessed by all threads, whereas local variables can be accessed only by the thread which is invoking the corresponding method. Variables are either pointer variables (to heap cells), locks, or data variables assuming values in D or F. We assume that all global variables are pointer variables. The body is built in the standard way from atomic commands, using standard control flow constructs (sequential composition, selection, and loop constructs). Atomic commands include assignments between variables, or fields of cells pointed to by a pointer variable. Method execution is terminated by executing a return command, which may return a value. The command new Node() allocates a new structure of type Node on the heap, and returns a reference to it. The compareand-swap command CAS(a, b, c) atomically compares the values of a and b. If equal, it assigns the value of c to a and returns true, otherwise, it leaves a unchanged and returns false. We assume a memory management mechanism, which automatically collects garbage, and ensures that a new cell is fresh, i.e., has not been used before; this avoids the so-called ABA problem (e.g., [31]).

We define a *program* P (over a concurrent data structure) to consist of an arbitrary number of concurrently executing threads, each of which executes a method that performs an operation on the data structure. The shared state is initialized by the init method prior to the start of program execution. A *configuration* of a program <sup>P</sup> is a tuple <sup>c</sup><sup>P</sup> <sup>=</sup> T, LOC, H where <sup>T</sup> is a set of threads, <sup>H</sup> is a heap, and LOC maps each thread th <sup>∈</sup> T to its local state LOC (th). We assume concurrent execution according to sequentially consistent memory model. The behavior of a thread th executing a method can be formalized as a transition relation −→th on pairs loc, H consisting of a local state loc and a heap state H. The behavior of a program P can be formalized by a transition relation −→<sup>P</sup> on program configurations; each step corresponds to a move of a single thread. I.e., there is a transition of form T, LOC, H −→<sup>P</sup> T, LOC[th <sup>←</sup> loc ], <sup>H</sup> whenever some thread th <sup>∈</sup> <sup>T</sup> has a transition loc, H −→th loc , H with LOC(th) = loc.

### **3.2 Linearizability**

In a concurrent data structure implementation, we represent the calling of a method by a *call action* callo <sup>m</sup> - d*in* , and the return of a method by a *return action* reto <sup>m</sup> (d*out*), where <sup>o</sup> <sup>∈</sup> <sup>N</sup> is an *action identifier*, which links the call and return of each method invocation. A *history* h is a sequence of actions such that (i) different occurrences of return actions have different action identifiers, and (ii) for each return action a<sup>2</sup> in h there is a unique *matching* call action a<sup>1</sup> with the same action identifier and method name, which occurs before a<sup>2</sup> in h. A call action which does not match any return action in h is said to be *pending*. A history without pending call actions is said to be *complete*. A *completed extension* of h is a complete history h obtained from h by appending (at the end) zero or more return actions that are matched by pending call actions in h, and thereafter removing the call actions that are still pending. For action identifiers o1, <sup>o</sup><sup>2</sup>, we write <sup>o</sup><sup>1</sup> h <sup>o</sup><sup>2</sup> to denote that the return action with identifier <sup>o</sup><sup>1</sup> occurs before the call action with identifier <sup>o</sup><sup>2</sup> in <sup>h</sup>. A complete history is *sequential* if it is of the form a1a 1a2a <sup>2</sup> ··· <sup>a</sup>na <sup>n</sup> where a <sup>i</sup> is the matching action of a<sup>i</sup> for all <sup>i</sup> : 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, i.e., each call action is immediately followed by its matching return action. We identify a sequential history of the above form with the corresponding trace *op*1*op*<sup>2</sup> ··· *op*<sup>n</sup> where *op*<sup>i</sup> <sup>=</sup> <sup>m</sup>(d*in* <sup>i</sup> , d*out* <sup>i</sup> ), <sup>a</sup><sup>i</sup> <sup>=</sup> callo*<sup>i</sup>* <sup>m</sup> - d*in* i , and a<sup>i</sup> = reto*<sup>i</sup>* <sup>m</sup> (d*out* <sup>i</sup> ), i.e., we merge each call action together with the matching return action into one operation. A complete history h is a *linearization* of h if (i) h is a permutation of <sup>h</sup>, (ii) <sup>h</sup> is sequential, and (iii) <sup>o</sup><sup>1</sup> h- <sup>o</sup><sup>2</sup> if <sup>o</sup><sup>1</sup> h <sup>o</sup><sup>2</sup> for each pair of action identifiers <sup>o</sup><sup>1</sup> and <sup>o</sup><sup>2</sup>. A sequential history <sup>h</sup> is *valid* wrt. DS if the corresponding trace is in [[DS]]. We say that <sup>h</sup> is *linearizable* wrt. DS if there is a completed extension of <sup>h</sup>, which has a linearization that is valid wrt. DS. We say that a program <sup>P</sup> is linearizable wrt. DS if, in each possible execution, the sequence of call and return actions is *linearizable* wrt. DS.

We specify linearizability using the technique of *observers* [1,3,7,9]. Depending on the data structure, we apply it in two different ways.


**Fig. 6.** Set observer.

Formally, an observer O is a tuple SO, s<sup>O</sup> init, <sup>X</sup><sup>O</sup>, ΔO, s<sup>O</sup> acc where <sup>S</sup><sup>O</sup> is a finite set of *observer locations* including the *initial location* s<sup>O</sup> init and the *accepting location* s<sup>O</sup> acc, a finite set <sup>X</sup><sup>O</sup> of *registers*, and <sup>Δ</sup><sup>O</sup> is a finite set of *transitions*. For observers that monitor sequences of operations, transitions are of the form <sup>s</sup>1, <sup>m</sup>(x*in*, x*out*), s<sup>2</sup> , where <sup>m</sup> <sup>∈</sup> <sup>M</sup> is a method name and <sup>x</sup>*in* and <sup>x</sup>*out* are either registers or constants, i.e., transitions are labeled by operations whose input or output data may be parameterized on registers. The observer processes a sequence of operations one operation at a time. If there is a transition, whose label (after replacing registers by their values) matches the operation, such a transition is performed. If there is no such transition, the observer remains in its current location. The observer accepts a sequence if it can be processed in such a way that an accepting location is reached. The observer is defined in such a way that it accepts precisely those sequences that are *not* in [[DS]]. Figure <sup>6</sup> depicts an observer for the set data structure.

To check that no execution of the program announces a sequence of labels that can drive the observer to an accepting location, we form the cross-product S = P⊗O of the program P and the observer O, synchronizing on common transition labels. Thus, configurations of <sup>S</sup> are of the form c<sup>P</sup> ,s, ρ, consisting of a program configuration <sup>c</sup><sup>P</sup> , an observer location <sup>s</sup>, and an assignment <sup>ρ</sup> of values in <sup>D</sup> to the observer registers. Transitions of <sup>S</sup> are of the form c<sup>P</sup> ,s, ρ, −→<sup>S</sup> ,c<sup>P</sup> ,s , ρ, obtained from a transition <sup>c</sup><sup>P</sup> λ −→<sup>P</sup> <sup>c</sup><sup>P</sup> of the program with some (possibly empty) label λ, where the observer makes a transition s λ −→s if it can perform such a matching transition, otherwise <sup>s</sup> <sup>=</sup> <sup>s</sup>. Note that the observer registers are not changed. We also add straightforward instrumentation to check that each method invocation announces exactly one operation, whose input and output values agree with the method's parameters and return value. This reduces the problem of checking linearizability to the problem of checking that in this cross-product, the observer cannot reach an accepting error location.

### **4 Verification Using Fragment Abstraction for Skiplists**

In the previous section, we reduced the problem of verifying linearizability to the problem of verifying that, in any execution of the cross-product of a program and an observer, the observer cannot reach an accepting location. We perform this verification by computing a symbolic representation of an invariant that is satisfied by all reachable configurations of the cross-product, using an abstract interpretation-based fixpoint procedure, starting from a symbolic representation of the set of initial configurations, thereafter repeatedly performing symbolic postcondition computations that extend the symbolic representation by the effect of any execution step of the program, until convergence.

In Sect. 4.1, we define in more detail our symbolic representation for skiplists, focusing in particular on the use of fragment abstraction, and thereafter (in Sect. 4.2) describe the symbolic postcondition computation. Since singly-linked lists is a trivial special case of skiplists, we can use the relevant part of this technique also for programs based on singly-linked lists.

### **4.1 Symbolic Representation**

This subsection contains a more detailed description of our symbolic representation for programs that operate on skiplists, which was introduced in Sect. 2. We first describe the data abstraction, thereafter the fragment abstraction, and finally their combination into a symbolic representation.

**Data Abstraction.** Our data abstraction is defined by assigning a abstract domain to each concrete domain of data values, as follows.


**Fragment Abstraction.** Let us now define our fragment abstraction for skiplists. For presentation purposes, we assume that each heap cell has at most one <sup>D</sup>-field, named data. For an observer register <sup>x</sup><sup>i</sup>, let a x<sup>i</sup>*-cell* be a heap cell whose data field has the same value as x<sup>i</sup>.

Since the number of levels is unbounded, we define an abstraction for levels. Let k be a level. Define the abstraction of a pointer variable of form p[k], denoted p [k], to be p[1] if k = 1, and to be p[higher] if k <sup>≥</sup> 2. That is, this abstraction does not distinguish different higher levels.

<sup>A</sup> *tag* is a tuple tag <sup>=</sup> dabs, pvars, reachfrom, reachto, private, where (i) dabs is a mapping from non-pointer fields to their corresponding abstract domains; if a non-pointer field is an array indexed by levels, then the abstract domain is that for single elements: e.g., the abstract domain for the array marked in Fig. <sup>2</sup> is simply the set of booleans, (ii) pvars is a set of abstracted pointer variables, (iii) reachfrom and reachto are sets of global pointer variables and observer registers, and (iv) private is a boolean value.

For a heap cell <sup>c</sup> that is accessible to thread th in a configuration <sup>c</sup><sup>S</sup> , and a tag tag <sup>=</sup> dabs, pvars, reachfrom, reachto, private, we let <sup>c</sup>cS th,k tag denote that <sup>c</sup> satisfies the tag tag "at level k". More precisely, this means that


Note that the global information represented by the fields reachfrom and reachto concerns *only* reachability via level-1 pointers.

<sup>A</sup> *skiplist fragment* v (or just fragment) is a triple of form i, o, φ, of form i, null, or of form i, ⊥, where i and o are tags and <sup>φ</sup> is a subset of {<, <sup>=</sup>, >}. Each skiplist fragment additionally has a *type*, which is either *level-1* or *higherlevel* (note that a level-1 fragment can otherwise be identical to a higher-level fragment). For a cell <sup>c</sup> which is accessible to thread th, and a fragment v of form i, o, φ, let <sup>c</sup> cS th,k <sup>v</sup> denote that the next[k] field of <sup>c</sup> points to a cell <sup>c</sup> such that c cS th,k <sup>i</sup>, and <sup>c</sup> cS th,k <sup>o</sup>, and <sup>c</sup>.data <sup>∼</sup> <sup>c</sup> .data for some ∼∈ <sup>φ</sup>. The definition of ccS th,k <sup>v</sup> is adapted to fragments of form i, null and i, ⊥ in the obvious way. For a fragment v <sup>=</sup> i, o, φ, we often use v.i for i and v.o for o, etc.

Let <sup>V</sup> be a set of fragments. A global configuration <sup>c</sup><sup>S</sup> satisfies <sup>V</sup> wrp. to th, denoted <sup>c</sup><sup>S</sup> <sup>|</sup>=*heap* th <sup>V</sup> , if


Intuitively, a set of fragment represents the set of heap states, in which each pair of cells connected by a next[1] pointer is represented by a level-1 fragment, and each pair of cells connected by a next[k] pointer for k <sup>≥</sup> 2 is represented by a higher-level fragment which represents array fields of cells at index k.

**Symbolic Representation.** We can now define our abstract symbolic representation.

Define a *local symbolic configuration* σ to be a mapping from local nonpointer variables (including the program counter) to their corresponding abstract domains. We let <sup>c</sup><sup>S</sup> <sup>|</sup>=*loc* th <sup>σ</sup> denote that in the global configuration <sup>c</sup><sup>S</sup> , the local configuration of thread th satisfies the local symbolic configuration <sup>σ</sup>, defined in the natural way. For a local symbolic configuration σ, an observer location s, a pair <sup>V</sup> of fragments and a thread th, we write <sup>c</sup><sup>S</sup> <sup>|</sup>=th σ, s, V to denote that (i) <sup>c</sup><sup>S</sup> <sup>|</sup>=*loc* th <sup>σ</sup>, (ii) the observer is in location <sup>s</sup>, and (iii) <sup>c</sup><sup>S</sup> <sup>|</sup>=*heap* th <sup>V</sup> .

**Definition 1.** *A* symbolic representation Ψ *is a partial mapping from pairs of local symbolic configurations and observer locations to sets of fragments. A system configuration* <sup>c</sup><sup>S</sup> *satisfies a symbolic representation* <sup>Ψ</sup>*, denoted* <sup>c</sup><sup>S</sup> *sat* <sup>Ψ</sup>*, if for each thread* th*, the domain of* <sup>Ψ</sup> *contains a pair* σ, s *such that* <sup>c</sup><sup>S</sup> <sup>|</sup>=th σ, s, Ψ(σ, s)*.*

### **4.2 Symbolic Postcondition Computation**

The symbolic postcondition computation must ensure that the symbolic representation of the reachable configurations of a program is closed under execution of a statement by some thread. That is, given a symbolic representation Ψ, the symbolic postcondition operation must produce an extension Ψ of Ψ, such that whenever <sup>c</sup><sup>S</sup> sat <sup>Ψ</sup> and <sup>c</sup>S−→<sup>S</sup> <sup>c</sup> <sup>S</sup> then <sup>c</sup><sup>S</sup> sat <sup>Ψ</sup> . Let th be an arbitrary thread. Then <sup>c</sup><sup>S</sup> sat <sup>Ψ</sup> means that Dom(Ψ) contains some pair σ, s with <sup>c</sup><sup>S</sup> <sup>|</sup>=th σ, s, Ψ(σ, s). The symbolic postcondition computation must ensure that Dom(Ψ ) contains a pair σ , s such that <sup>c</sup> <sup>S</sup> <sup>|</sup>=th σ , s , Ψ (σ , s ). In the thread-modular approach, there are two cases to consider, depending on which thread causes the step from <sup>c</sup><sup>S</sup> to <sup>c</sup><sup>S</sup> .


In the following, we first describe the symbolic postcondition computation for local steps, and thereafter the intersection operation.

**Symbolic Postcondition Computation for Local Steps.** Let th be an arbitrary thread, assume that σ, s ∈ Dom(Ψ), and let <sup>V</sup> <sup>=</sup> <sup>Ψ</sup>(σ, s) For each statement that th can execute in a configuration <sup>c</sup><sup>S</sup> with <sup>c</sup><sup>S</sup> <sup>|</sup>=th σ, s, V , we must compute a local symbolic configuration σ , a new observer location <sup>s</sup> and a set <sup>V</sup> of fragments such that the resulting configuration <sup>c</sup><sup>S</sup> satisfies c <sup>S</sup> <sup>|</sup>=th σ , s , V . This computation is done differently for each statement. For statements that do not affect the heap or pointer variables, this computation is standard, and affects only the local symbolic configuration, the observer location, and the dabs component of tags. We therefore here describe how to compute the effect of statements that update pointer variables or pointer fields of heap cells, since these are the most interesting cases. In this computation, the set V is constructed in two steps: (1) First, the level-1 fragments of V are computed, based on the level-1 fragments in V . (2) Thereafter, the higher-level fragments of V are computed, based on the higher-level fragments in V and how fragments in V are transformed when entered in to V . We first describe the construction of level-1 fragments, and thereafter the construction of higher-level fragments.

**Construction of Level-1 Fragments.** Let us first intuitively introduce techniques used for constructing the level-1 fragments of V . Consider a statement of form g := p, which assigns the value of a local pointer variable p to a global pointer variable g. The set <sup>V</sup> of fragments is obtained by modifying fragments in <sup>V</sup> to reflect the effect of the assignment. For any tag in a fragment, the dabs field is not affected. The pvars field is updated to contain the variable g if and only if it contained the variable p before the statement. The difficulty is to update the reachability information represented by the fields reachfrom and reachto, and in particular to determine whether g should be in such a set after the statement (note that if p were a global variable, then the corresponding reachability information for p would be in the fields reachfrom and reachto, and the update would be simple, reflecting that g and p become aliases). In order to construct V with sufficient precision, we therefore investigate whether the set of fragments <sup>V</sup> allows to form a heap in which a p-cell can reach or be reached from (by a sequence of next[1] pointers) a particular tag of a fragment. We also investigate whether a heap can be formed in which a p-cell can *not* reach or be reached from a particular tag. For each such successful investigation, the set V will contain a level-1 fragment with corresponding contents of its reachto and reachfrom fields.

The postcondition computation performs this investigation by computing a set of transitive closure-like relations between level-1 fragments, which represent reachability via sequences of next[1] pointers (since only these are relevant for the reachfrom and reachto fields). First, say that two tags tag and tag are *consistent* (wrp. to a set of fragments <sup>V</sup> ) if the concretizations of their dabsfields overlap, and if the other fields pvars, reachfrom, reachto, and private) agree. Thus, tag and tag are consistent if there can exist a cell <sup>c</sup> accessible to th in some heap, with <sup>c</sup> cS th tag and <sup>c</sup> cS th tag . Next, for two level-1 fragments <sup>v</sup><sup>1</sup> and <sup>v</sup><sup>2</sup> in a set <sup>V</sup> of fragments,


Intuitively, <sup>v</sup><sup>1</sup> <sup>→</sup><sup>V</sup> <sup>v</sup><sup>2</sup> denotes that it is possible that <sup>c</sup>1.next[1] = <sup>c</sup><sup>2</sup> for some cells with c<sup>1</sup> cS th,<sup>1</sup> <sup>v</sup><sup>1</sup> and <sup>c</sup><sup>2</sup> cS th,<sup>1</sup> <sup>v</sup><sup>2</sup>. Intuitively, <sup>v</sup><sup>1</sup> <sup>↔</sup><sup>V</sup> <sup>v</sup><sup>2</sup> denotes that it is possible that <sup>c</sup>1.next[1] = <sup>c</sup>2.next[1] for different cells <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>2</sup> with <sup>c</sup>1cS th,<sup>1</sup> <sup>v</sup><sup>1</sup> and c<sup>2</sup> cS th,<sup>1</sup> <sup>v</sup><sup>2</sup> (Note that these definitions also work for fragments containing null or <sup>⊥</sup>). We use these relations to define the following derived relations on level-1 fragments:


We sometimes use, e.g., <sup>v</sup><sup>2</sup> <sup>+</sup><sup>∗</sup> <sup>↔</sup><sup>V</sup> <sup>v</sup><sup>1</sup> for <sup>v</sup><sup>1</sup> ∗+ <sup>↔</sup><sup>V</sup> <sup>v</sup><sup>2</sup>. We say that <sup>v</sup><sup>1</sup> and <sup>v</sup><sup>2</sup> are *compatible* if <sup>v</sup><sup>x</sup> ∗ <sup>→</sup> <sup>v</sup><sup>y</sup>, or <sup>v</sup><sup>y</sup> ∗ <sup>→</sup> <sup>v</sup><sup>x</sup>, or <sup>v</sup><sup>x</sup> ∗∗ <sup>↔</sup> <sup>v</sup><sup>y</sup>. Intuitively, if <sup>v</sup><sup>1</sup> and <sup>v</sup><sup>2</sup> are satisfied by two cells in the same heap state, then they must be compatible.

**Fig. 7.** Illustration of some transitive closure-like relations between fragments

Figure 7 illustrates the above relations for a heap state with 13 heap cells. The figure depicts, in green, four pairs of heap cells connected by a next[1] pointer, which satisfy the four fragments v<sup>1</sup>, <sup>v</sup><sup>2</sup>, <sup>v</sup><sup>3</sup>, and <sup>v</sup><sup>4</sup>, respectively. At the bottom are depicted the transitive-closure like relations that hold between these fragments.

We can now describe the symbolic postcondition computation for statements that affect pointer variables or fields. This is a case analysis, and for space reasons we only include some representative cases.

First, consider a statement of form x := y, where x and y are local (to thread th) or global pointer variables. We must compute a set <sup>V</sup> of fragments which are satisfied by the configuration after the statement. We first compute the level-1-fragments in V as follows (higher-level fragments will be computed later). We observe that for any cell <sup>c</sup> which is accessible to th after the statement, there must be some level-1 fragment v in <sup>V</sup> with <sup>c</sup>cS th,<sup>1</sup> <sup>v</sup> . By assumption, c satisfies some fragment v in <sup>V</sup> before the statement, and is in the same heap state as the cell pointed to by y. This implies that v must be compatible with some fragment <sup>v</sup><sup>y</sup> <sup>∈</sup> <sup>V</sup> such that <sup>y</sup> <sup>∈</sup> <sup>v</sup><sup>y</sup>.i.pvars (recall that <sup>y</sup> is the abstraction of <sup>y</sup>, which in the case that y is an array element maps higher level indices to that abstract index higher). This means that we can make a case analysis on the possible relationships between <sup>v</sup> and any such <sup>v</sup><sup>y</sup>. Thus, for each fragment <sup>v</sup><sup>y</sup> <sup>∈</sup> <sup>V</sup> such that y <sup>∈</sup> v<sup>y</sup>.i.pvars we let <sup>V</sup> contain the fragments obtained by any of the following transformations on any fragment in V .

1. First, for the fragment <sup>v</sup><sup>y</sup> itself, we let <sup>V</sup> contain <sup>v</sup> <sup>y</sup>, which is the same as v<sup>y</sup>, except that

– v <sup>y</sup>.i.pvars <sup>=</sup> v<sup>y</sup>.i.pvars ∪ {x} and v <sup>y</sup>.o.pvars <sup>=</sup> v.o.pvars \ {x} *<sup>y</sup>.*i*.*reachto <sup>=</sup> <sup>v</sup>*y.*i*.*reachto∪{*<sup>y</sup>.*i*.*reachfrom <sup>=</sup> <sup>v</sup>*y.*i*.*reachfrom∪{*<sup>y</sup>.*o*.*reachfrom <sup>=</sup> <sup>v</sup>*y.*o*.*reachfrom∪{*<sup>y</sup>.*o*.*reachto <sup>=</sup> <sup>v</sup>*y.*o*.*reachto\{-

and furthermore, if x is a global variable, then – v- x} and v- x},

– v- x} and v- x}.

	- v .i.pvars <sup>=</sup> v.i.pvars \ {x},
	- v .o.pvars <sup>=</sup> v.o.pvars ∪ {x},
	- v .i.reachfrom <sup>=</sup> v.i.reachfrom \ {x} if x is a global variable,
	- v .i.reachto <sup>=</sup> v.i.reachto ∪ {x} if x is a global variable,
	- v .o.reachfrom <sup>=</sup> v.o.reachfrom ∪ {x} if x is a global variable,
	- v .o.reachto <sup>=</sup> v.o.reachto ∪ {x} if x is a global variable,

3. We perform analogous inclusions for fragments v with v <sup>+</sup> <sup>→</sup><sup>V</sup> <sup>v</sup><sup>y</sup>, <sup>v</sup><sup>y</sup> ∗ <sup>→</sup><sup>V</sup> <sup>v</sup>, <sup>v</sup><sup>y</sup> ∗+ <sup>↔</sup><sup>V</sup> <sup>v</sup>, and <sup>v</sup><sup>y</sup> ∗◦ <sup>↔</sup><sup>V</sup> <sup>v</sup>. Here, we show only the case of <sup>v</sup><sup>y</sup> ∗+ <sup>↔</sup><sup>V</sup> <sup>v</sup>, in which case we let <sup>V</sup> contain <sup>v</sup> which is the same as <sup>v</sup> except that <sup>x</sup> is removed from the sets v .i.pvars, v .o.pvars, v .i.reachfrom, v .i.reachto, v .o.reachfrom, and v .o.reachto.

The statement x := y.next[1] is handled rather similarly to the case x := y. Let us therefore describe the postcondition computation for statements of the form x.next[1] := y. This is the most difficult statement, since it is a destructive update of the heap. It affects reachability relations for both x and y. The postcondition computation makes a case analysis on how a fragment in V is related to some pair of compatible fragments <sup>v</sup>x, <sup>v</sup><sup>y</sup> in <sup>V</sup> such that <sup>x</sup> <sup>∈</sup> <sup>v</sup>x.i.pvars, <sup>y</sup> <sup>∈</sup> <sup>v</sup>y.i.pvars. Thus, for each pair of compatible fragments <sup>v</sup>x, <sup>v</sup><sup>y</sup> in <sup>V</sup> such that x <sup>∈</sup> vx.i.pvars and <sup>y</sup> <sup>∈</sup> <sup>v</sup>y.i.pvars, it is first checked whether the statement may form a cycle in the heap. This may happen if <sup>v</sup><sup>y</sup> ∗ <sup>→</sup><sup>V</sup> <sup>v</sup>x, in which case the postcondition computation reports a potential cycle. Otherwise, V consists of

	- <sup>↔</sup> v. In this case, – for each subset regset of the observer registers in v.i.reachfrom <sup>∩</sup> v<sup>x</sup>.i.reachfrom, and for each subset regset of the set of observer registers in v.o.reachfrom <sup>∩</sup> v<sup>x</sup>.i.reachfrom, we let <sup>V</sup> contain a fragment <sup>v</sup> which is the same as v except that v .i.reachfrom = (v.i.reachfrom \vx.i.reachfrom) <sup>∪</sup> regset and <sup>v</sup> .o.reachfrom = (v.o.reachfrom \ v<sup>x</sup>.i.reachfrom) <sup>∪</sup> regset . An intuitive explanation for the rule for v .i.reachfrom is that the global variables that can reach v<sup>x</sup>.i should clearly be removed from v .i.reachfrom since <sup>v</sup><sup>x</sup> ∗ <sup>→</sup><sup>V</sup> <sup>v</sup> is false after the statement. However, for an observer register x<sup>i</sup>, an <sup>x</sup><sup>i</sup>-cell can still reach v .i, if there are two x<sup>i</sup>-cells, one which reaches v<sup>x</sup>.i and another which reaches v .i; we cannot precisely determine for which <sup>x</sup><sup>i</sup> this may be the case, except that any such <sup>x</sup><sup>i</sup> must be in <sup>v</sup>.i.reachfrom∩v<sup>x</sup>.i.reachfrom. The intuition for the rule for v .o.reachfrom is analogous.

**Construction of Higher-Level Fragments.** Based on the above construction of level-1 fragments, the set of higher-level fragments in V is obtained as follows. For each higher level-fragment <sup>v</sup> <sup>∈</sup> <sup>V</sup> , let <sup>v</sup><sup>1</sup> and <sup>v</sup><sup>2</sup> be level 1-fragments such that v<sup>1</sup>.i.tag <sup>=</sup> <sup>v</sup>.i.tag and <sup>v</sup><sup>2</sup>.i.tag <sup>=</sup> <sup>v</sup>.o.tag. For any fragments <sup>v</sup> 1 and v <sup>2</sup> that are derived from <sup>v</sup><sup>1</sup> and <sup>v</sup><sup>2</sup>, respectively, <sup>V</sup> contains a higher-level fragment v which is the same as <sup>v</sup> except that (i) <sup>v</sup> .i.pvars <sup>=</sup> v <sup>1</sup>.i.pvars and v .o.pvars <sup>=</sup> v <sup>2</sup>.i.pvars, (ii) v .i.reachfrom <sup>=</sup> v <sup>1</sup>.i.reachfrom and v .o.reachfrom <sup>=</sup> v <sup>2</sup>.i.reachfrom, and (iii) v .i.reachto <sup>=</sup> v <sup>1</sup>.i.reachto and v .o.reachto <sup>=</sup> v <sup>2</sup>.i.reachto. In addition, a statement of form x.next[k] := y for k <sup>≥</sup> 2 creates a new fragment. The formation of this fragment is simpler than for the statement x.next[1] := y, since reachability via next[1]-pointers is preserved.

**Symbolic Postcondition Computation for Interference Steps.** Here, the key step is the *intersection* operation, which takes two sets of fragments V<sup>1</sup> and <sup>V</sup>2, and produces a set of joint fragments <sup>V</sup>1,2, such that <sup>c</sup><sup>S</sup> <sup>|</sup>=*heap* th1,th<sup>2</sup> <sup>V</sup>1,<sup>2</sup> for any configuration such that <sup>c</sup><sup>S</sup> <sup>|</sup>=*heap* th*<sup>i</sup>* <sup>V</sup><sup>i</sup> for <sup>i</sup> = 1, 2 (here <sup>|</sup>=*heap* th1,th<sup>2</sup> is defined in the natural way). This means that for each heap cell accessible to either th<sup>1</sup> or th<sup>2</sup>, the set <sup>V</sup>1,<sup>2</sup> contains a fragment <sup>v</sup> with <sup>c</sup>cS {th1,th2},k <sup>v</sup> for each <sup>k</sup> which is at most the height of c (generalizing the notation cS th,k to several threads). Note that a joint fragment represents local pointer variables of both th<sup>1</sup> and th<sup>2</sup>. In order to distinguish between local variables of th<sup>1</sup> and th<sup>2</sup>, we use <sup>x</sup>[i] to denote a local variable x of thread th<sup>i</sup>. Here, we describe the intersection operation for level-1 fragments. The intersection operation is analogous for higher-level fragments.

For a fragment v, define v.i.greachfrom as the set of global variables in v.i.reachfrom. Define v.i.greachto, v.o.greachfrom, v.o.greachto, v.i.gpvars, and v.o.gpvars analogously. Define v.i.gtag as the tuple v.i.dabs, v.i.gpvars, v.i.greachfrom, v.i.greachto, and define v.o.gtag analogously. We must distinguish the following possibilities.

	- v12.i.pvars <sup>=</sup> <sup>v</sup>1.i.pvars <sup>∪</sup> <sup>v</sup>2.i.pvars,
	- v12.o.pvars <sup>=</sup> v1.o.pvars <sup>∪</sup> v2.o.pvars,
	- v12.i.reachfrom <sup>=</sup> <sup>v</sup>1.i.reachfrom <sup>∪</sup> <sup>v</sup>2.i.reachfrom, and
	- v12.o.reachfrom <sup>=</sup> v1.o.reachfrom <sup>∪</sup> v2.o.reachfrom.
	- v <sup>1</sup>.o.pvars <sup>=</sup> v<sup>1</sup>.o.pvars <sup>∪</sup> v<sup>2</sup>.o.pvars, and
	- v <sup>1</sup>.o.reachfrom <sup>=</sup> v<sup>1</sup>.o.reachfrom <sup>∪</sup> v<sup>2</sup>.o.reachfrom.

### **5 Arrays of Singly-Linked Lists with Timestamps**

In this section, we show how to apply fragment abstraction to concurrent programs that operate on a shared heap which represents an array of singly linked lists. We use this abstraction to provide the first automated verification of linearizability for the Timedstamped stack and Timestamped queue algorithms of [12] as reported in Sect. 6.

**Fig. 8.** Description of the Timestamped stack algorithm, with some simplifications.

Figure 8 shows a simplified version of the Timestamped Stack (TS stack) of [12], where we have omitted the check for emptiness in the pop method, and the optimization using push-pop elimination. These features are included in the full version of the algorithm, that we have verified automatically.

The algorithm uses an array of singly-linked lists (SLLs), one for each thread, accessed via the thread-indexed array pools[maxThreads] of pointers to the first cell of each list. The init method initializes each of these pointers to null. Each list cell contains a data value, a timestamp value, a next pointer, and a boolean flag mark which indicates whether the node is logically removed from the stack. Each thread pushes elements only to "its own" list, but can pop elements from any list.

<sup>A</sup> push method for inserting a data element d works as follows: first, a new cell with element d and minimal timestamp <sup>−</sup>1 is inserted at the beginning of the list indexed by the calling thread (line 1–3). After that, a new timestamp is created and assigned (via the variable t) to the ts field of the inserted cell (line 4–5). Finally, the method unlinks (i.e., physically removes) all cells that are reachable (through a sequence of next pointers) from the inserted cell and whose mark field is true; these cells are already logically removed. This is done by redirecting the next pointer of the inserted cell to the first cell with a false mark field, which is reachable from the inserted cell.

<sup>A</sup> pop method first traverses all lists, finding in each list the first cell whose mark field is false (line 8), and letting the variable youngest point to the most recent such cell (i.e., with the largest timestamp) (line 1–11). A compare-andswap (CAS) is used to set the mark field of this youngest cell to true, thereby logically removing it. This procedure will restart if the CAS fails. After the youngest cell has been removed, the method will unlink all cells, whose mark field is true, that appear before (line 17–19) or after (line 20–23) the removed cell. Finally, the method returns the data value of the removed cell.

**Fragment Abstraction.** In our verification, we establish that the TS stack algorithm of Fig. 8 is correct in the sense that it is a linearizable implementation of a stack data structure. For stacks and queues, we specify linearizability by observers that synchronize on call and return actions of methods, as shown by [7]; this is done without any user-supplied annotation, hence the verification is fully automated.

The verification is performed analogously as for skiplists, as described in Sect. 4. Here we show how fragment abstraction is used for arrays of singly-linked lists. Figure 9 shows an example heap state of TS stack. The heap consists of a set of singly linked lists (SLLs), each of which is accessed from a pointer in the array pools[maxThreads] in a configuration when it is accessed concurrently by three threads th<sup>1</sup>, th<sup>2</sup>, and th<sup>3</sup>. The heap consists of three SLLs accessed from the three pointers pools[1], pools[2], and pools[3] respectively. Each heap cell is shown with the values of its fields, using the layout shown to the right in Fig. 9. In addition, each cell is labeled by the pointer variables that point to it. We use lvar(i) to denote the local variable lvar of thread thi.

In the heap state of Fig. 9, thread th<sup>1</sup> is trying to push a new node with data value 4, pointed by its local variable new, having reached line 3. Thread th<sup>3</sup> has just called the push method. Thread th<sup>2</sup> has reached line 12 in the execution of the pop method, and has just assigned youngest to the first node in the list pointed to by pools[3] which is not logically removed (in this case it is the last node of that list). The observer has two registers <sup>x</sup>1 and <sup>x</sup>2, which are assigned the values 4 and 2, respectively.

We verify the algorithm using a symbolic representation that is analogous to the one used for skiplists. There are two main differences.


**Fig. 9.** A possible heap state of TS stack with three threads.

In the definition of tags, the only global variables that can occur in the fields reachfrom and reachto are therefore pools[me] and pools[other]. The data abstraction represents (i) for each cell, the set of observer registers, whose values are equal to the datafield, (ii) for each timestamp and observer register x<sup>i</sup>, the possible orderings between this timestamp and the timestamp of an x<sup>i</sup>-cell.

**Fig. 10.** Fragment abstraction

Figure <sup>10</sup> shows a set of fragments that is satisfied wrp. to th<sup>2</sup> by the configuration in Fig. 9. There are 7 fragments, named v<sup>1</sup>,..., <sup>v</sup><sup>7</sup>. Consider the tag which occurs in fragment v<sup>7</sup>. This tag is an abstraction of the bottom-rightmost heap cell in Fig. 9, The different non-pointer fields are represented as follows.


**Symbolic Postcondition Computation.** The symbolic postcondition computation is similar to that for skiplists. Main differences are as follows.


### **6 Experimental Results**

Based on our framework, we have implemented a tool in OCaml, and used it for verifying various kinds of concurrent data structures implementation of stacks, priority queues, queues and sets. All of them are based on heap structures. There are three types of heap structures we consider in our experiments.


**Fig. 11.** Times for verifying concurrent data structure implementations. Column **a** shows the verification times for our tool based on fragment abstraction. Column **b** shows the verification times for the tool for SLLs in our previous work [3]

*Singly-linked list benchmarks:* These benchmarks include stacks, queues and sets algorithms which are the well-known in the literature. The challenge is that in some set implementation, the linearization points are not fixed, they depended on the future of each execution. The sets with non fixed linearization points are the lazy set [20], lock-free sets of *HM* [22], *Harris* [17], *Michael* [29], and unordered set of [48]. By using observers and controllers in our previous work [3]. Our approach is simple and strong enough to verify these singly-linked list benchmarks.

*Skiplist benchmarks:* We consider four skiplist algorithms including the lockbased skiplist set [31], the lock-free skiplist set which is described in Sect. 2 [22], and two skiplist-based priority queues [26,27]. One challenge for verifying these algorithms is to deal with unbounded number of levels. In addition, in the lockfree skiplist [22] and priority queue [26], the skiplist shape is not well formed, meaning that each higher level list need not be a sub-list of lower level lists. These algorithms have not been automatically verified in previous work. By applying our fragment abstraction, to the best of our knowledge, we provide first framework which can automatically verify these concurrent skiplists algorithms.

*Arrays of singly-linked list benchmarks:* We consider two challenging timestamp algorithms in [12]. There are two challenges when verifying these algorithm. The first challenge is how to deal with an unbounded number of SLLs, and the second challenge is that the linearization points of the algorithms are not fixed, but depend on the future of each execution. By combining our fragment abstraction with the observers for stacks and queues in [7], we are able to verify these two algorithms automatically. The observers are crucial for achieving automation, since they enforce the weakest possible ordering constraints that are necessary for proving linearizability, thereby making it possible to use a less precise abstraction.

*Running Times.* The experiments were performed on a desktop 2.8 GHz processor with 8 GB memory. The results are presented in Fig. 11, where running times are given in seconds. Column a shows the verification times of our tool, whereas column b shows the verification times for algorithms based on SLLs, using the technique in our previous work [3]. In our experiments, we run the tool together with an observer in [1,7] and controllers in [3] to verify linearizability of the algorithms. All experiments start from the initial heap, and end either when the analysis reaches a fixed point or when a violation of safety properties or linearizability is detected. As can be seen from the table, the verification times vary in the different examples. This is due to the types of shapes that are produced during the analysis. For instance, skiplist algorithms have much longer verification times. This is due to the number of pointer variables and their complicated shapes. In contrast, other algorithms produce simple shape patterns and hence they have shorter verification times.

*Error Detection.* In addition to establishing correctness of the original versions of the benchmark algorithms, we tested our tool with intentionally inserted bugs. For example, we omitted setting time statement in line 5 of the push method in the TS stack algorithm, or we omitted the CAS statements in lock-free algorithms. The tool, as expected, successfully detected and reported the bugs.

### **7 Conclusions**

We have presented a novel shape abstraction, called fragment abstraction, for automatic verification of concurrent data structure implementations that operate on different forms of dynamically allocated heap structures, including singlylinked lists, skiplists, and arrays of singly-linked lists. Our approach is the first framework that can automatically verify concurrent data structure implementations that employ skiplists and arrays of singly linked lists, at the same time as handling an unbounded number of concurrent threads, an unbounded domain of data values (including timestamps), and an unbounded shared heap. We showed fragment abstraction allows to combine local and global reachability information to allow verification of the functional behavior of a collection of threads.

As future work, we intend to investigate whether fragment abstraction can be applied also to other heap structures, such as concurrent binary search trees.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Security

### **Reasoning About a Machine with Local Capabilities Provably Safe Stack and Return Pointer Management**

Lau Skorstengaard1(B) , Dominique Devriese<sup>2</sup>, and Lars Birkedal<sup>1</sup>

> <sup>1</sup> Aarhus University, Aarhus, Denmark {lau,birkedal}@cs.au.dk <sup>2</sup> imec-DistriNet, KU Leuven, Leuven, Belgium dominique.devriese@cs.kuleuven.be

**Abstract.** Capability machines provide security guarantees at machine level which makes them an interesting target for secure compilation schemes that provably enforce properties such as control-flow correctness and encapsulation of local state. We provide a formalization of a representative capability machine with local capabilities and study a novel calling convention. We provide a logical relation that semantically captures the guarantees provided by the hardware (a form of capability safety) and use it to prove control-flow correctness and encapsulation of local state. The logical relation is not specific to our calling convention and can be used to reason about arbitrary programs.

### **1 Introduction**

Compromising software security is often based on attacks that break programming language properties relied upon by software authors, such as control-flow correctness, local-state encapsulation, etc. Commodity processors offer little support for defending against such attacks: they offer security primitives with only coarse-grained memory protection and limited compartmentalization scalability. As a result, defenses against attacks on control-flow correctness and local-state encapsulation are either limited to only certain common forms of attacks (leading to an attack-defense arms race) and/or rely on techniques like machine code rewriting [1,2], machine code verification [3], virtual machines with a native stack [4] or randomization [5]. The latter techniques essentially emulate protection techniques on existing hardware, at the cost of performance, system complexity and/or security.

*Capability machines* are a type of processors that remediate these limitations with a better security model at the hardware level. They are based on old ideas [6– 8], but have recently received renewed interest; in particular, the CHERI project has proposed new ideas and ways of tackling practical challenges like backwards compatibility and realistic OS support [9,10]. Capability machines tag every word (in the register file and in memory) to enforce a strict separation between numbers and capabilities (a kind of pointers that carry authority). Memory capabilities carry the authority to read and/or write to a range of memory locations. There is also a form of *object capabilities*, which represent the authority to invoke a piece of code without exposing the code's encapsulated private state (e.g., the M-Machine's enter capabilities or CHERI's sealed code/data pairs).

Unlike commodity processors, capability machines lend themselves well to enforcing local-state encapsulation. Potentially, they will enable compilation schemes that enforce this property in an efficient but also 100% watertight way (ideally evidenced by a mathematical proof, guaranteeing that we do not end up in a new attack-defense arms race). However, a lot needs to happen before we get there. For example, it is far from trivial to devise a compilation scheme adapted to the details of a specific source language's notion of encapsulation (e.g., private member variables in OO languages often behave quite differently than private state in ML-like languages). And even if such a scheme were defined, a formal proof depends on a formalization of the encapsulation provided by the capability machine at hand.

A similar problem is the enforcement of control-flow correctness on capability machines. An interesting approach is taken in CheriBSD [9]: the standard contiguous C stack is split into a central, trusted stack, managed by trusted call and return instructions, and disjoint, private, per-compartment stacks. To prevent illegal use of stack references, the approach relies on *local capabilities*, a type of capabilities offered by CHERI to *temporarily* relinquish authority, namely for the duration of a function invocation whereafter the capability can be revoked. However, details are scarce (how does it work precisely? what features are supported?) and a lot remains to be investigated (e.g., combining disjoint stacks with cross-domain function pointers seems like it will scale poorly to large numbers of components?). Finally, there is no argument that the approach is watertight and it is not even clear what security property is targeted exactly.

In this paper, we make two main contributions: (1) an alternative calling convention that uses local capabilities to enforce stack frame encapsulation and well-bracketed control flow, and (2) perhaps more importantly, we adapt and apply the well-studied techniques of step-indexed Kripke logical relations for reasoning about code on a representative capability machine with local capabilities in general and correctness and security of the calling convention in particular. More specifically, we make the following contributions:


standard fundamental theorem of logical relations—to the best of our knowledge, our theorem is the most general and powerful formulation of the formal guarantees offered by a capability machine (a form of capability safety [11,12]), including the specific guarantees offered for local capabilities. It is very general and not tied to our calling convention or a specific way of using the system's capabilities. We are the first to apply these techniques for reasoning about capability machines and we believe they will prove useful for many other purposes than our calling convention.


For reasons of space, some details and all proofs have been omitted; please refer to the technical appendix [14] for those.

### **2 A Capability Machine with Local Capabilities**

In this paper, we work with a formal capability machine with all the characteristics of real capability machines, as well as local capabilities much like CHERI's. Otherwise, it is kept as simple as possible. It is inspired by both the M-Machine [6] and CHERI [9]. To avoid uninteresting details, we assume an infinite address space and unbounded integers.

We define the syntax of our capability machine in Fig. 1. We assume an infinite set of addresses Addr and define machine words as either integers or capabilities of the form ((*perm*, *g*), *base*, *end*, *a*). Such a capability represents the authority to execute permissions *perm* on the memory range [*base*, *end*], together with a current address *a* and a locality tag *g* indicating whether the capability is global or local. There is no notion of pointers other than capabilities, so we will use the terms interchangeably. The available permissions and their ordering are depicted in Fig. 3: the permissions include null

permission (o), readonly (ro), read/write (rw), read/execute (rx) and read- /write/execute (rwx) permissions. Additionally, there are three special permissions: read/write-local (rwl), read/write-local/execute (rwlx) and enter (e), which we will explain below.


**Fig. 1.** The syntax of our capability machine assembly language.



**Fig. 2.** An excerpt from the operational semantics.

We assume a finite set of register names RegName. We define register files *reg* and memories *ms* as functions mapping register names resp. addresses to words. The state of the entire machine is represented as a configuration that is either a running state <sup>Φ</sup> <sup>∈</sup> ExecConf containing a memory and a register file, or a failed or halted state, where the latter keeps hold of the final state of memory.

The machine's instruction set is rather basic. Instructions i include relatively standard jump (jmp), conditional jump (jnz) and move (move, copies words between registers) instructions. Also familiar are load and store instructions for reading from and writing to memory (load and store) and arithmetic addition operators (lt (less than), plus and minus, operating only on numbers). There are three instructions for modifying capabilities: lea (modifies the current address), restrict (modifies the permission and local/global tag) and subseg (modifies the range of a capability). Importantly, these instructions take care that the resulting capability always carries less authority than the original (e.g. restrict will only weaken a permission). Finally, the instruction isptr tests whether a word is a capability or a number and instructions getp, getl, getb, gete and geta provide access to a capability's permissions, local/global tag, base, end and current address, respectively.

Figure 2 shows an excerpt of the operational semantics for a few representative instructions. Essentially, a configuration Φ either decodes and executes the instruction at Φ.reg(pc) if it is executable and its address is in the valid range or otherwise fails. The table in the figure shows for instructions i the result of executing them in configuration <sup>Φ</sup>. fail and halt obviously fail and halt respectively. move simply modifies the register file as requested and updates the pc to the next instruction using the meta-function *updPc*.

The load instruction loads the contents of the requested memory location into a register, but only if the capability has appropriate authority (i.e. read permission and an appropriate range). restrict updates a capability's permissions and global/local tag in the register file, but only if the new permissions are weaker than the original. It also never turns local capabilities into global ones. geta queries the current address of a capability and stores it in a register.

The jmp instruction updates the program counter to a requested location, but it is complicated by the presence of *enter capabilities*, modeled after the M-Machine's [6]. Enter capabilities cannot be used to read, write or execute and their address and range cannot be modified. They can only be used to jump to, but when that happens, their permission changes to rx. They can be used to represent a kind of closures: an opaque package containing a piece of code together with local encapsulated state. Such a package can be built as an enter capability <sup>c</sup> = ((e, *<sup>g</sup>*), *<sup>b</sup>*, *<sup>e</sup>*, *<sup>a</sup>*) where the range [*b*, *<sup>a</sup>* <sup>−</sup> 1] contains local state (data or capabilities) and [*a*, *e*] contains instructions. The package is opaque to an adversary holding c but when c is jumped to, the instructions can start executing and have access to the local data through the updated version of c that is then in pc.

Finally, the store instruction updates the memory to the requested value if the capability has write authority for the requested location. However, the instruction is complicated by the presence of *local capabilities*, modeled after the ones in the CHERI processor [9]. Basically, local capabilities are special in that they can only be kept in registers, i.e. they cannot be stored to memory. This means that local capabilities can be *temporarily* given to an adversary, for the duration of an invocation: if we take care to clear the capability from the register file after control is passed back to us, they will not have been able to store the capability. However, there is one exception to the rule above: local capabilities can be stored to memory for which we have a capability with writelocal authority (i.e. permission rwl or rwlx). This is intended to accommodate a stack, where register contents can be stored, including local capabilities. As long as all capabilities with write-local authority are themselves local and the stack is cleared after control is passed back by the adversary, we will see that this does not break the intended behavior of local capabilities.

We point out that our local capabilities capture only a part of the semantics of local capabilities in CHERI. Specifically, in addition to the above, CHERI's default implementation of the CCall exception handler forbids local capabilities from being passed across module boundaries. Such a restriction fundamentally breaks our calling convention, since we pass around local return pointers and stack capabilities. However, CHERI's CCall is not implemented in hardware, but in software, precisely to allow experimenting with alternative models like ours.

In order to have a reasonably realistic system, we use a simple model of linking where a program has access to a linking table that contains capabilities for other programs. We also assume malloc to be part of the trusted computing base satisfying a certain specification. Malloc and linking tables are described further in the next section, but we refer to the technical appendix [14] for full details.

### **3 Stack and Return Pointer Management Using Local Capabilities**

One of the contributions in this paper is a demonstration that local capabilities on a capability machine support a calling convention that enforces control-flow correctness in a way that is provably watertight, potentially efficient, does not rely on a trusted central stack manager and supports higher-order interfaces to an adversary, where an adversary is just some unknown piece of code. In this section, we explain this convention's high-level approach, the security measures to be taken in a number of situations (motivating each separately with a summary table at the end). After that, we define a number of reusable macro-instructions that can be used to conveniently apply the proposed convention in subsequent examples.

The basic idea of our approach is simple: we stick to a single, rather standard, C stack and register-passed stack and return pointers, much like a standard C calling convention. However, to prevent various ways of misusing this basic scheme, we put local capabilities to work and take a number of not-alwaysobvious safety measures. The safety measures are presented in terms of what *we* need to do to protect ourselves against an *adversary*, but this is only for presentation purposes as our code assumes no special status on the machine. In fact, an adversary can apply the same safety measures to protect themselves against us. In the next paragraphs, we will explain the issues to be considered in all the relevant situations: when (1) starting our program, (2) returning to the adversary, (3) invoking the adversary, (4) returning from the adversary, (5) invoking an adversary callback and (6) having a callback invoked by the adversary.

**Program Start-Up.** We assume that the language runtime initializes the memory as follows: a contiguous array of memory is reserved for the stack, for which we receive a stack pointer in a special register r*stk* . We stress that the stack is not built-in, but merely an abstraction we put on this piece of the memory. The stack pointer is local and has rwlx permission. Note that this means that we will be placing and executing instructions on the stack. Crucially, the stack is the only part of memory for which the runtime (including malloc, loading, linking) will ever provide rwlx or rwl capabilities. Additionally, our examples typically also assume some memory to store instructions or static data. Another part of memory (called the heap) is initially governed by malloc and at program start-up, no other code has capabilities for this memory. Malloc hands out rwx capabilities for allocated regions as requested (no rwlx or rwl permissions). For simplicity, we assume that memory allocated through malloc cannot be freed.

**Returning to the Adversary.** Perhaps the simplest situation is returning to the adversary after they invoked our code. In this case, we have received a return pointer from them, and we just need to jump to it as usual. An obvious security measure to take care of is properly clearing the non-return-value registers before we jump (since they may contain data or capabilities that the adversary should not get access to). Additionally, we may have used the stack for various purposes (register spilling, storing local state when invoking other functions etc.), so we also need to clear that data before returning to the adversary.

However, if we are returning from a function that has itself invoked adversary code, then clearing the used part of the stack is not enough. The *unused* part of the stack may also contain data and capabilities, left there by the adversary, including local capabilities since the stack is write-local. As we will see later, we rely on the fact that the adversary cannot keep hold of local capabilities when they pass control to the trusted code and receive control back. In this case, the adversary could use the unused part of the stack to store local pointers and load them from there after they get control back. To prevent this, we need to clear (i.e. overwrite with zeros) the entire part of the stack that the adversary has had access to, not just the parts that we have used ourselves. Since we may be talking about a large part of memory, this requirement is the most problematic aspect of our calling convention for performance, but see Sect. 6 for how this might be mitigated.

**Invoking the Adversary.** A slightly more complex case is invoking the adversary. As above, we clear all the non-argument registers, as well as the part of the stack that we are not using (because, as above, it may contain local capabilities from previously executed code that the adversary could exploit in the same way). We leave a copy of the stack pointer in r*stk* , but only after we have used the subseg instruction to shrink its authority to the part that we are not using ourselves.

In one of the registers, we also provide a return pointer, which must be a local capability. If it were global, the adversary would be able to store away the return pointer in a global data structure (i.e. there exists a global capability for it), and jump to it later, in circumstances where this should not be possible. For example, they could store the return pointer, legally jump to it a first time, wait to be invoked again and then jump to the old return pointer a second time, instead of the new return pointer received for the second invocation. Similarly, they could store the return pointer, invoke a function in our code, wait for us to invoke them again and then jump to the old return pointer rather than the new one, received for the second invocation. By making the return pointer local, we prevent such attacks: the adversary can only store local capabilities through write-local capabilities, which means (because of our assumptions above): on the stack. Since the stack pointer itself is also local, it can also only be stored on the stack. Because we clear the part of the stack that the adversary has had access to before we pass control back, there is no way for them to recover either of these local capabilities.

Note that storing stack pointers for use during future invocations would also be dangerous in itself, i.e. not just because it can be used to store return pointers. Imagine the adversary stores their stack pointer, invokes trusted code that uses part of the stack to store private data and then invokes the adversary again with a stack pointer restricted to exclude the part containing the private data. If the adversary had a way of keeping hold of their old stack pointer, it could access the private data stored there by the trusted code and break local-state encapsulation.

**Returning from the Adversary.** So return pointers must be passed as local capabilities. But what should their permissions be, what memory should they point to and what should that memory (the activation record) contain? Let us answer the last question first by considering what should happen when the adversary jumps to a return pointer. In that case, the program counter should be restored to the instruction after the jump to the adversary, so the activation record should store this old program counter. Additionally, the stack pointer should also be restored to its original value. Since the adversary has a more restricted authority over the stack than the code making the call, we cannot hope to reconstruct the original stack pointer from the stack pointer owned by the adversary. Instead, it should be stored as part of the activation record.

Clearly, neither of these capabilities should be accessible by the adversary. In other words, the return pointer provided to the adversary must be a capability that they can jump to but not read from, i.e. an enter capability. To make this work, we construct the activation record as depicted in Fig. 4. The e return pointer has authority over the entire activation record (containing the previous return and stack pointer), and its current address points to a number of restore instructions in the record, so that upon invocation, these instructions are executed and can load the old stack pointer and program counter back into the register file. As the return pointer is an enter pointer, the adversary cannot get

**Fig. 4.** Structure of an activation record

hold of the activation record's contents, but after invocation, its permission is updated to rx, so the contents become available to the restore instructions.

The final question that remains is: where should we store this activation record? The attentive reader may already see that there is only one possibility: since the activation record contains the old stack pointer, which is local, the activation record can only be constructed in a part of memory where we have write-local access, i.e. on the stack. Note that this means we will be placing and executing instructions on the stack, i.e. it will not just contain code pointers and data. This means that our calling convention should be combined with protection against stack smashing attacks (i.e. buffer overflows on the stack overwriting activation records' contents). Luckily, the capability machine's finegrained memory protection should make it reasonably easy for a compiler to implement such protection, by making sure that only appropriately bounded versions of the stack pointer are made available to source language code.

**Invoking an Adversary Callback.** If we have a higher-order interface to the adversary, we may need to invoke an adversary callback. In this case, not so much changes with respect to the situation where we invoke static adversary code. The adversary can provide a callback as a capability for us to jump to, either an e-capability if they want to protect themselves from us or just an rx capability if they are not worried about that. However, there is one scenario that we need to prevent: if they construct the callback capability to point into the stack, it may contain local capabilities that they should not have access to upon invocation of the callback. As before, this includes return and stack pointers from previous stack frames that they may be trying to illegally use inside the callback.

To prevent this, we only accept callbacks from the adversary in the form of global capabilities, which we dynamically check before invoking them (and we fail otherwise). This should not be an overly strict requirement: our own callbacks do not contain local data themselves, so there should be no need for the adversary to construct callbacks on the stack.<sup>1</sup>

**Having a Callback Invoked by the Adversary.** The above leaves us with perhaps the hardest scenario: how to provide a callback to the adversary. The

<sup>1</sup> Note that it does prevent a legitimate but non-essential scenario where the adversary wants to give us temporary access to a callback not allocated on the stack.

basic idea is that we allocate a block of memory using malloc that we fill with the capabilities and data that the callback needs, as well as some prelude instructions that load the data into registers and jumps to the right code. Note that this implies that no local capabilities can be stored as part of a closure. We can then provide the adversary with an enter-capability covering the allocated block and pointing to the contained prelude instructions. However, the question that remains in this setup is: from where do we get a stack pointer when the callback is invoked?

Our answer is that the adversary should provide it to us, just as we provide them with a stack pointer when we invoke their code. However, it is important that we do not just accept any capability as a stack pointer but check that it is safe to use. Specifically, we check that it is indeed an rwlx capability. Without this check, an adversary could potentially get control over our local stack frame during a subsequent callback by passing us a local rwx capability to a global data structure instead of a proper stack pointer and a global callback for our callback to invoke. If our local state contains no local capabilities, then, otherwise following our calling convention, the callback would not fail and the adversary could use a stored capability for the global data structure to access our local state. To prevent this from happening, we need to make sure the stack capability carries rwlx authority, since the system wide assumption then tells us that the adversary cannot have global capabilities to our local stack.

**Calling Convention.** With the security measures introduced and motivated, let us summarize our proposed calling convention: *At program start-up* A local rwlx stack pointer resides in register <sup>r</sup>*stk* . No global write-local capabilities. *Before returning to the adversary* Clear non-return-value registers. Clear the part of the stack we had access to (not just the part we used). *Before invoking the adversary* Push activation record to the stack. Create return pointer as local e-capability to the instructions in the record. Restrict the stack capability to the unused part and clear it. Clear non-argument registers. *Before invoking an adversary callback* Make sure callback is global. *When invoked by an adversary* Make sure received stack pointer has permission rwlx.

**Reusable Macro Instructions.** We define a number of reusable macros capturing the calling convention and other conveniences. All macros that use the stack assume a stack pointer in register r*stk* . The macro fetch r *name* fetches the capability related to *name* from the linking table and stores it in register r. The macros push r and pop r add and remove elements from the stack. The macro prepstk r is used when a callback is invoked by the adversary and prepares the received stack pointer by checking that it has permission rwlx. The macro scall <sup>r</sup>(r*args*,r*priv* ) jumps to the capability in register <sup>r</sup> in the manner described above. That is, it pushes local state (the contents of registers r*priv* ) and the activation record (return code, return pointer, stack pointer) to the stack, creates an e return pointer, restricts the stack pointer, clears the unused part of the stack, clears the necessary registers and jumps to r. Upon return, the private state is restored. The macro mclear r clears all the memory the capability in register r has authority over. The macro rclear *regSet* clears all the registers in *regSet*. The macro reqglob r checks whether the word in register r is a global capability. The macro crtcls (x*i*, r*i*) r allocates a closure where r points to the closure's code and a new environment is allocated (using malloc) where the contents of r*<sup>i</sup>* is stored. In the code referred to by r, an implicit fetch happens when an instruction refers to x*i*.

The technical appendix [14] contains detailed descriptions of all the macros.

### **4 Logical Relation**

In this section, we formalize the guarantees provided by the capability machine, including the specific guarantees for local capabilities, by means of a step-indexed Kripke logical relation with recursively defined worlds. We use the logical relation in the following section to show local-state encapsulation and control-flow integrity properties for challenging example programs.

#### **4.1 Worlds**

A world is a finite map from region names, modeled as natural numbers, to regions that each correspond to an invariant of part of the memory. We have three types of regions: *permanent*, *temporary*, and *revoked*. Each permanent and temporary region contains a state transition system, with public and private transitions, to describe how the invariants are allowed to change over time. In other words, they are protocols for the region's memory. These are similar to what has been used in logical relations for high-level languages [11,13,15]. Protocols imposed by permanent regions stay in place indefinitely. Any capability, local or global, can depend on these protocols. Protocols imposed by temporary regions can be revoked in private future worlds. Doing this may break the safety of local capabilities but not global ones. This means that local capabilities can safely depend on the protocols imposed by temporary regions, but global capabilities cannot, since a global capability may outlive a temporary region that is revoked. This is illustrated in Fig. 5.

**Fig. 5.** The relation between local/global capabilities and temporary/permanent regions. The colored fields are regions governing parts of memory. Global capabilities cannot depend on temporary regions.

For technical reasons, we do not actually remove a revoked temporary region from the world, but we turn it into a special revoked region that exists for this purpose. Such a revoked region contains no state transition system and puts no requirements on the memory. It simply serves as a mask for a revoked temporary region. Masking a region like this goes back to earlier work of Ahmed [16] and was also used by Birkedal et al. [17].

Regions are used to define safe memory segments, but this set may itself be world-dependent. In other words, our worlds are defined recursively. Recursive worlds are common in Kripke models and the following lemma uses the method of Birkedal and Bizjak [18]; Birkedal et al. [19] for constructing them. The formulation of the lemma is technical, so we recommend that non-expert readers ignore the technicalities and accept that there exists a set of worlds Wor and two relations *priv* and *pub* satisfying the (recursive) equations in the theorem (where the operator can be safely ignored).

**Theorem 1.** *There exists a c.o.f.e. (complete ordered family of equivalences)* Wor *and preorders priv and pub such that* (Wor, *priv* ) *and* (Wor, *pub*) *are preordered c.o.f.e.'s, and there exists an isomorphism* ξ *such that*

ξ : Wor ∼= -(N *fin* <sup>−</sup> Region) Region = {revoked} {temp} × State <sup>×</sup> Rels <sup>×</sup> (State <sup>→</sup> (Wor *mon, ne* −−−−→ *pub* UPred(MemSeg))) {perm} × State <sup>×</sup> Rels <sup>×</sup> (State <sup>→</sup> (Wor *mon, ne* −−−−→ *priv* UPred(MemSeg))) *and for* W, W <sup>∈</sup> Wor. <sup>W</sup> *priv* <sup>W</sup> <sup>⇔</sup> <sup>ξ</sup>(W ) *priv* <sup>ξ</sup>(W) <sup>W</sup> *pub* <sup>W</sup> <sup>⇔</sup> <sup>ξ</sup>(W ) *pub* <sup>ξ</sup>(W)

In the above theorem, State×Rels corresponds to the aforementioned state transition system where Rels contains pairs of relations corresponding to the public and private transitions, and State is an unspecified set that we assume to contain at least the states we use in this paper. The last part of the temporary and permanent regions is a state interpretation function that determines what memory segments the region permits in each state of the state transition system. The different monotonicity requirements in the two interpretation functions reflects how permanent regions rely only on permanent protocols whereas temporary regions can rely on both temporary and permanent protocols. UPred(MemSeg) is the set of step-indexed, downwards closed predicates on memory segments: UPred(MemSeg) = {<sup>A</sup> <sup>⊆</sup> <sup>N</sup> <sup>×</sup> MemSeg | ∀(n, ms) <sup>∈</sup> A.∀<sup>m</sup> <sup>≤</sup> n.(m, ms) <sup>∈</sup> <sup>A</sup>}.

With the recursive domain equation solved, we could take Wor as our notion of worlds, but it is technically more convenient to work with the following definition instead:

$$\text{World} = \mathbb{N} \stackrel{\text{fin}}{\rightharpoonup} \text{Region}$$

**Future Worlds.** The future world relations model how memory may evolve over time. The *public future world* <sup>W</sup>*pub* <sup>W</sup> requires that dom(W ) <sup>⊇</sup> dom(W) and <sup>∀</sup><sup>r</sup> <sup>∈</sup> dom(W). W (r) *pub* <sup>W</sup>(r). That is, in a public future world, new regions may have been allocated, and existing regions may have evolved according to the public future region relation (defined below). The *private future world* relation <sup>W</sup>*priv* <sup>W</sup> is defined similarly, using a private future region relation. The *public future* region relation is the simplest. It satisfies the following properties:

$$\frac{(s,s') \in \phi\_{pub}}{(v,s',\phi\_{pub},\phi,H) \sqsupseteq^{pub}(v,s,\phi\_{pub},\phi,H)} \qquad \frac{(\text{temp},s,\phi\_{pub},\phi,H) \in \text{Region}}{(\text{temp},s,\phi\_{pub},\phi,H) \sqsupseteq^{pub} \text{revoked}}$$
 
$$\overline{\text{revoked} \sqsupseteq^{pub} \text{revoked}}$$

Both temporary and permanent regions are only allowed to transition according to the public part of their transition system. Additionally, revoked regions must either remain revoked or be replaced by a temporary region. This means that the public future world relations allows us to reinstate a region that has been revoked earlier. The *private future region* relation satisfies:

$$\frac{(s, s') \in \phi}{(v, s', \phi\_{pub}, \phi, H) \sqsupseteq^{priv}(v, s, \phi\_{pub}, \phi, H)} \qquad \frac{r \in \text{Region}}{r \sqsubseteq^{priv}(\text{temp}, s, \phi\_{pub}, \phi, H)}$$

$$\frac{r \in \text{Region}}{r \sqsubseteq^{priv} \text{revoked}}$$

Here, revocation of temporary regions is allowed. In fact, temporary regions can be replaced by an arbitrary other region, not just the special revoked. Conversely, revoked regions may also be replaced by any other region. On the other hand, permanent regions cannot be masked away. They are only allowed to transition according to the private part of the transition system.

Notice that the public future region relation is a subset of the private future region relation.

**World Satisfaction.** A memory satisfies a world, written *ms* :*<sup>n</sup>* W, if it can be partitioned into disjoint parts such that each part is accepted by an active (permanent or temporary) region. Revoked regions are not taken into account as their memory protocols are no longer in effect.

$$\forall ms:\_n W \quad \text{iff} \quad \begin{cases} \exists P:\, active(W) \to \text{MemSeg.}\, ms = \bigoplus\_{r \in \text{active}(W)} P(r) \quad \text{and} \\ \quad \forall r \in \text{active}(W). \\ \quad \exists H, s. W(r) = (\text{...}, s, \text{...}, H) \quad \text{and} \quad (n, P(r)) \in H(s) (\xi^{-1}(W)). \end{cases}$$

$$\begin{split} \mathcal{O}(W) & \stackrel{\text{def}}{=} \left\{ \left. \begin{array}{l} \left| \begin{array}{l} \forall ms\_{f}, mem', i \leq n. \left(reg., ms \left(ms\_{f}\right) \rightarrow i \left(half, deg.m' \right) \rightarrow \\ \exists W' \mid \sum\_{m} W, ms\_{m}, ms'. \end{array} \right. \\ \left| \begin{array}{l} \mathcal{R} \ \cdot \text{World} \stackrel{\text{map.}}{=} \underset{\exists^{\text{ed}}}{\left(\left(\begin{array}{l} m, req\right) \right)} \;\left(\forall e \; m \right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\underset{\scriptstyle{m}}{\left(\left(\begin{array}{l} m, req\right) \right)} \;\left(\forall e \;/\left(ms\_{f}\right) \right) \;\int\_{\mathcal{V}} W \;\!/\left(\left(\begin{array}{l} m, req\right) \right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!/\left(ms\_{f}\right) \;\int\_{\mathcal{V}} W \;\!$$

**Fig. 6.** The logical relation.

### **4.2 Logical Relation**

The logical relation defines semantically when values, program counters, and configurations are capability safe. The definition is found in Figs. 6 and 7 and we provide some explanations in the following paragraphs. For space reasons, we omit some definitions and explain them only verbally, but precise definitions can be found in the technical appendix [14].

First, the *observation relation* O defines what configurations we consider safe. A configuration is safe with respect to a world, when the execution of said configuration does not break the memory protocols of the world. Roughly speaking, this means that when the execution of a configuration halts, then there is a private future world that the resulting memory satisfies. Notice that failing is considered safe behavior. In fact, the machine often resorts to failing when an unauthorized access is attempted, such as loading from a capability without read permission. This is similar to Devriese et al. [11]'s logical relation for an untyped language, but unlike typical logical relations for typed languages, which require that programs do not fail.

The *register-file relation* R defines safe register-files as those that contain safe words (i.e. words in V) in all registers but pc. The *expression relation* E defines that a word is safe to use as a program counter if it can be plugged into

$$\begin{array}{l|l} \mathit{readCon}(g)(W) = \left\{ (n, (b, e)) \; \middle| \; \begin{array}{l} \exists r \in \mathit{locally} Reg(g, W). \\ \exists [b', e'] \supseteq [b, e] \; W(r) \stackrel{n}{\subset} t\_{b', e'}^{\mathit{poly}} \end{array} \right\} \\ \mathit{writeCon}(\iota, g)(W) = \left\{ (n, (b, e)) \; \middle| \; \begin{array}{l} \exists r \in \mathit{locally} Reg(g, W). \\ W(r) \text{ is odd} \\ \exists [b', e'] \supseteq [b, e] \; W(r) \stackrel{n-1}{\supset} t\_{b', e'} \\ \exists [b', e'] \supseteq [b, e] \; W(r) \stackrel{n-1}{\supset} t\_{b', e'} \end{array} \right\} \\ \mathit{exccCon}(g)(W) = \left\{ (n, (P, b, e)) \; \middle| \; \begin{array}{l} \forall n' < n, W' \sqsubseteq W, a \in [b, e], perm \in P. \\ \left(n', (\mathit{perm}, g), b, e, a\right) \in \mathcal{E}(W') \end{array} \right\} \\ \mathit{centerCon}(g)(W) = \left\{ (n, (b, e, a)) \; \middle| \; \begin{array}{l} \forall n' < n, W' \sqsubseteq W \\ (n', (\mathit{(ns}, g), b, e, a)) \in \mathcal{E}(W') \end{array} \right\} \\ \mathit{where } g = \mathit{local} \Rightarrow \exists \neg \equiv \neg^{piv} \text{ and } g = \mathit{global} \Rightarrow \neg \equiv \neg^{\neg^{\mathit{priv}}} \end{array}$$

**Fig. 7.** Permission-based conditions

a safe register file (i.e. a register file in R) and paired with a memory satisfying the world to become a safe configuration. Note that integers and non-executable capabilities (e.g. ro and e capabilities) are considered safe program counters because when they are plugged into a register file and paired with a memory, the execution will immediately fail, which is safe.

The *value relation* V defines when words are safe. We make the value relation as liberal as possible by considering what is the most we can allow an adversary to use a capability for without breaking the memory protocols. Non-capability data is always safe because it provides no authority. Capabilities give the authority to manipulate memory and potentially break memory protocols, so they need to satisfy certain conditions to be safe. In Fig. 7, we define such a condition for each kind of permission a capability can have.

For capabilities with read permission, the *readCond* ensures that it can only be used to read safe words, i.e. words in the value relation. To guarantee this, we require that the addressed memory is governed by a region W(r) that imposes safety as a requirement on the values contained. This safety requirement is formulated in terms of a standard region ι *pwl <sup>b</sup>,<sup>e</sup>* . The definition of that standard region is omitted for space reasons, but it simply requires all the words in the range [*b*, *e*] to be safe, i.e. in the value relation. Requiring that W(r) *n* ⊂∼ ι *pwl <sup>b</sup>,<sup>e</sup>* means that W(r) must accept only safe values like ι *pwl <sup>b</sup>,<sup>e</sup>* , but can be even more restrictive if desired. The read condition also takes into account the locality of the capability because, generally speaking, global capabilities should only depend on permanent regions. Concretely, we use the function *localityReg*(*g*, W), which projects out all active (non-revoked) regions when the locality *g* is local, but only the permanent regions when *g* is global. The definition of the standard region ι *pwl b,e* can be found in [14]; it makes use of the isomorphism from Theorem 1.

For a capability with write permission, *writeCond* must be satisfied for the capability's range of authority. An adversary can use such a capability to write any word they can get a hold of, and we can safely assume that they can only get a hold of safe words, so the region governing the relevant memory must allow any safe word to be written there. In order to make the logical relation as liberal as possible, we make this a lower bound of what the region may allow. For write capabilities, we also have to take into account the two flavours of write permissions: write and write-local. In the case of write-local capabilities, the region needs to allow (at least) any safe word to be written, but in the case of write capabilities, the capability cannot be used to write local capabilities, so the region only needs to allow safe non-local values. In the write condition, this is handled by parameterizing it with a region. For the write-local capabilities the write condition is applied with the standard region ι *pwl <sup>b</sup>,<sup>e</sup>* that we described previously. For the write capabilities we use a different standard region ι *nwl <sup>b</sup>,<sup>e</sup>* which requires that the words in [*b*, *e*] are non-local and safe. As before, we use *localityReg* to pick an appropriate region based on the capability's locality. Finally, there is a technical requirement that the region must be *address-stratified*. Intuitively, this means that if a region accepts two memory segments, then it must also accept every memory segment "in between", that is every memory segment where each address contains a value from one of the two accepted memory segments. An interesting property of the write condition is that they prohibit global writelocal capabilities which, as discussed in Sect. 3, is necessary for any safe use of local capabilities.

The conditions *enterCond* and *execCond* are very similar. Both require that the capability can be safely jumped to. However, executable capabilities can be updated to point anywhere in their range, so they must be safe as a program counter (in the E-relation) no matter the current address. In contrast, enter capabilities are opaque and can only be used to jump to the address they point to. They also change permission when jumped to, so we require them to be safe as a program counter after the permission is changed to rx. Because the capabilities are not necessarily invoked immediately, this must be true in any future world, but it depends on the capability's locality which future worlds we consider. If it is global, then we require safety as a program counter in *private* future worlds (where temporary regions may be revoked). For local capabilities, it suffices to be safe in *public* future worlds, where temporary regions are still present.

In the technical appendix, we prove that safety of all values is preserved in public future worlds, and that safety of global values is also preserved in private future worlds:

### **Lemma 1 (Double monotonicity of value relation)**

*– If* <sup>W</sup> *pub* <sup>W</sup> *and* (n, w) ∈ V(W)*, then* (n, w) ∈ V(W )*. – If* <sup>W</sup> *priv* <sup>W</sup> *and* (n, w) ∈ V(W) *and* <sup>w</sup> = ((*perm*, global), *<sup>b</sup>*, *<sup>e</sup>*, *<sup>a</sup>*) *(i.e.* <sup>w</sup> *is a global capability), then* (n, w) ∈ V(W )*.*

### **4.3 Safety of the Capability Machine**

With the logical relation defined, we can now state the fundamental theorem of our logical relation: a strong theorem that formalizes the guarantees offered by the capability machine. Essentially, it says a capability that only grants safe authority is capability safe as a program counter.

**Theorem 2 (Fundamental theorem).** *If one of the following holds:*


*then* (n,((*perm*, *<sup>g</sup>*), *<sup>b</sup>*, *<sup>e</sup>*, *<sup>a</sup>*)) ∈ E(W)

The permission based conditions of Theorem 2 make sure that the capability only provides safe authority in which case the capability must be in the E relation, i.e. it can safely be used as a program counter in an otherwise safe register-file.

The Fundamental Theorem can be understood as a general expression of the guarantees offered by the capability machine, an instance of a general property called capability safety [11,12]. To understand this, consider that the theorem says the capability ((*perm*, *g*), *b*, *e*, *a*) is safe as a program counter, without any assumption about what instructions it actually points to (the only assumptions we have are about the read or write authority that it carries). As such, the theorem expresses the capability safety of the machine, which guarantees that *any* instruction is fine and will not be able to go beyond the authority of the values it has access to. We demonstrate this in Sect. 5 where Theorem 2 is used to reason about capabilities that point to arbitrary instructions. The relation between Theorem 2 and local-state encapsulation and control-flow correctness, will also be shown by example in Sect. 5 as the examples depend on these properties for correctness. See the technical appendix [14] for a detailed proof (by induction over the step-index n) of the theorem.

### **5 Examples**

In this section, we demonstrate how our formalization of capability safety allows us to prove local-state encapsulation and control-flow correctness properties for challenging program examples. The security measures of Sect. 3 are deployed to ensure these properties. Since we are dealing with assembly language, there are many details to the formal treatment, and therefore we necessarily omit some details in the lemma statements. The examples may look deceivingly short, but it is because they use the macro instructions described in Sect. 3. The examples would be unintelligible without the macros, as each macro expands to multiple basic instructions. The interested reader can find all the technical details in the technical appendix [14].


**Fig. 8.** Two example programs that rely on local-state encapsulation. f1 uses our stack-based calling convention. f2 does not rely on a stack.

### **5.1 Encapsulation of Local State**

f1 and f2 in Fig. 8 demonstrate the capability machine's encapsulation of local state. They are very similar: both store some local state, call an untrusted piece of code (*adv*), and then test whether the local state is unchanged. They differ in the way they do this. Program f1 uses our stack-based calling convention (captured by scall) to call the adversary, so it can use the available stack to store its local state. On the other hand, f2 uses malloc to allocate memory for its local state and uses an activation-record based calling convention (described in the technical appendix) to run the adversarial code.

For both programs, we can prove that if they are linked with an adversary, *adv*, that is allowed to allocate memory but has no other capabilities, then the assertion will never fail during executing (see Lemmas 2 and 3 below). The two examples also illustrate the versatility of the logical relation. The logical relation is not specific to any calling convention, so we can use it to reason about both programs, even though they use different calling conventions.

In order to formulate results about f1 and f2, we need a way to observe whether the assertion fails. To this end, we assume they have access to a flag (an address in memory). If the assertion fails, then the flag is set to 1 and execution halts. The correctness lemma for f1 then states:

**Lemma 2.** *Let*

c*adv def* = ((e, global),...) <sup>c</sup>*stk def* = ((rwlx, local),...) c*f*1 *def* = ((rwx, global),...) <sup>c</sup>*link def* = ((ro, global),...) c*malloc def* = ((e, global),...) *reg* <sup>∈</sup> Reg m *def* = *ms<sup>f</sup>*<sup>1</sup> *msflag mslink msadv msmalloc msstk msframe*

*where each of the capabilities have an appropriate range of authority and pointer*<sup>2</sup>*. Furthermore*

*– ms<sup>f</sup>*<sup>1</sup> *contains* c*link ,* c*flag and the code of f1*

*– msflag* (*flag*)=0


*If* (*reg*[pc <sup>→</sup> <sup>c</sup>*<sup>f</sup>*1][r*stk* <sup>→</sup> <sup>c</sup>*stk* ], m) <sup>→</sup><sup>∗</sup> (*halted*, m )*, then* m (*flag*)=0

<sup>2</sup> These assumptions are kept intentionally vague for brevity. Full statements are in the technical appendix [14].

To prove Lemma 2, it suffices to show that the start configuration is safe (in the O relation) for a world with a permanent region that requires the assertion flag to be 0. By an anti-reduction lemma, it suffices to show that the configuration is safe after some reduction steps. We then use a general lemma for reasoning about scall, by which it suffices to show that (1) the configuration that scall will jump to is safe and (2) that the configuration just after scall is done cleaning up is safe. We use the Fundamental Theorem to reason about the unknown adversarial code, but notice that the adversary capability is an enter capability, which the Fundamental Theorem says nothing about. Luckily the enter capability becomes rx after the jump and then the Fundamental Theorem applies.

We have a similar lemma for f2:

**Lemma 3.** *Making similar assumptions about capabilities and linking as in Lemma <sup>2</sup> but assuming no stack pointer, if* (*reg*[pc <sup>→</sup> <sup>c</sup>*f*2], m) <sup>→</sup><sup>∗</sup> (*halted*, m )*, then* m (*flag*)=0*.*

### **5.2 Well-Bracketed Control-Flow**

Using the stack-based calling convention of scall, we get well-bracketed controlflow. To illustrate this, we look at two example programs f3 and g1 in Fig. 9.

In f3 there are two calls to an adversary and in order for the assertion in the middle to succeed, they need to be well-bracketed. If the adversary were able to store the return pointer from the first call and invoke it in the second call, then f3 would have 2 on top of its stack and the assertion would fail. However, the security measures in Sect. 3 prevent this attack: specifically, the return pointer is local, so it can only be stored on the stack, but the part of the stack that is accessible to the adversary is cleared before the second invocation. In fact, the following lemma shows that there are also no other attacks that can break well-bracketedness of this example, i.e. the assertion never fails. It is similar to the two previous lemmas:

**Lemma 4.** *Making similar assumptions about capabilities and linking as in Lemma <sup>2</sup> if* (*reg*[pc <sup>→</sup> <sup>c</sup>*<sup>f</sup>*3][r*stk* <sup>→</sup> <sup>c</sup>*stk* ], m) <sup>→</sup><sup>∗</sup> (*halted*, m )*, then* m (*flag*)=0*.*

The final example, g1 with f4, is a faithful translation of a tricky example known from the literature (known as the awkward example) [13,20]. It consists of two parts, g1 and f4. g1 is a closure generator that generates closures with one variable x set to 0 in its environment and f4 as the program (note we can omit some calling convention security measures because the stack is not used in the closure generator). f4 expects one argument, a callback. It sets x to 0 and calls the callback. When it returns, it sets x to 1 and calls the callback a second time. When it returns again, it asserts x is 1 and returns. This example is more complicated than the previous ones because it involves a closure invoked by the adversary and an adversary callback invoked by us. As explained in Sect. 3, this means that we need to check (1) that the stack pointer that the closure receives from the adversary has write-local permission and (2) that the adversary callback is global.


**Fig. 9.** Two programs that rely on well-bracketedness of scalls to function correctly. offset is the offset to f4.

To illustrate how subtle this program is, consider how an adversary could try to make the assertion fail. In the second callback an adversary can get to the first callback by invoking the closure one more time. If there were any way for the adversary to transfer the return pointer from the point where it reinvokes the closure to where the closure reinvokes the callback, then the assertion could be made to fail. Similarly, if there were any way for the adversary to store a stack pointer or trick the trusted code into preserving it across an invocation, the assertion can likely be made to fail too. However, our calling convention prevents any of this from happening, as we prove in the following lemma.

### **Lemma 5.** *Let*

$$c\_{adv} \stackrel{def}{=} \left( \left( \text{RWx}, \text{global} \right), \dots \right) \\ c\_{g1} \stackrel{def}{=} \left( \left( \text{E}, \text{global} \right), \dots \right).$$

*and otherwise make assumptions about capabilities and linking similar to Lemma 2. Then if* (*reg*0[pc <sup>→</sup> <sup>c</sup>*adv* ][r*stk* <sup>→</sup> <sup>c</sup>*stk* ][r<sup>1</sup> <sup>→</sup> <sup>c</sup>*g*1], m) <sup>→</sup><sup>∗</sup> (*halted*, m ), *then* m (*flag*)=0*.*

As explained in Sect. 3, the macro-instruction reqglob r<sup>1</sup> checks that the callback is global, essentially to make sure it is not allocated on the stack where it might contain old stack pointers or return pointers. Otherwise, the encapsulation of our local stack frame could be broken. In the proof of Lemma 5, this requirement shows up because we invoke the callback in a world that is only a private future world of the one where we received the callback, precisely because we have invalidated the adversary's local state (particularly their old stack and return capabilities). The callback is still valid in this private future world, but only because we know that it is global.

In Lemma 5 the order of control has been inverted compared to the previous lemmas. In this lemma, the adversary assumes control first with a capability for the closure creator g1. Consequently, we need to check that all arguments are safe to use and that we clean up before returning in the end. The inversion of control poses an interesting challenge when it comes to reasoning about the adversary's local state during the execution of f4 and the callbacks where the adversary should not rely on the local state from before the call of f4. This is easily done by revoking all the temporary regions of the world given at the start of f4. However, when f4 returns, the adversary is again allowed to rely on its old local state so we need to guarantee that the local state is unchanged. This is important because the return pointer that f4 receives may be local, and the adversary is allowed to allocate the activation record on the stack (just like we do) so they can store and recover their old stack pointer after f4 returns. By utilizing the reinstation mechanism of the future world relation as well as our knowledge of the future worlds used, we can construct a world in which the adversary's invariants are preserved. The details of this and the proofs of the other lemmas are found in the technical appendix [14].

### **6 Discussion**

### **Calling Convention**

*Formulating Control Flow Correctness.* While we claim that our calling convention enforces control-flow correctness, we do not prove a general theorem that shows this, because it is not clear what such a theorem should look like. Formulations in terms of a control-flow graph, like the one by Abadi et al. [2], do not take into account temporal properties, like the well-bracketedness that Example g1 relies on. In fact, our examples show that our logical relation imply a stronger form of control-flow correctness than such formulations, although this is not made very explicit. As future work, we consider looking at a more explicit and useful way to formalize control-flow correctness. The idea would be to define a variant of our capability machine with call and return instructions and well-bracketed control flow built-in to the operational semantics, and then prove that compiling such code to our machine using our calling convention is fully abstract [21].

*Performance and the Requirement for Stack Clearing.* The additional security measures of the calling convention described in Sect. 3 impose an overhead compared to a calling convention without security guarantees. However, most of our security measures require only a few atomic checks or register clearings on boundary crossings between trusted code and adversary, which should produce an acceptable performance overhead. The only exception are the requirements for stack clearing that we have in two situations: when returning to the adversary and when invoking an adversary callback. As we have explained, we need to clear all of the stack that we are not using ourselves, not just the part that we have actually used. In other words, on every boundary cross between trusted code and adversary code, a potentially large region of memory must be cleared. We believe this is actually a common requirement for typical usage scenarios of local capabilities and capability machines like CHERI should consider to provide special support for this requirement, in the form of a highly-optimized instruction for erasing a large block of memory. Nevertheless, from a discussion with the designers of the CHERI capability machine, we gather that it is not immediately clear whether and how such a primitive could be implemented efficiently in the CHERI context.

*Modularity.* It is important that our calling convention is modular, i.e. we do not assume that our code is specially privileged w.r.t. the adversary, and they can apply the same measures to protect themselves from us as we do to protect ourselves from them. More concretely, the requirements we have on callbacks and return pointers received from the adversary are also satisfied by callbacks and return pointers that we pass to them. For example, our return pointers are local capabilities because they must point to memory where we can store the old stack pointer, but the adversary's return pointers are also allowed to be local. Adversary callbacks are required to be global but the callbacks we construct are allocated on the heap and also global.

*Arguments and Local Capabilities.* Local capabilities are a central part of the calling convention as they are used to construct stack and return pointers. The use of local capabilities for the calling convention unfortunately limits the extent to which local capabilities can be used for other things. Say we are using the calling convention and receive a local capability other than the stack and return pointer, then we need to be careful if we want to use it because it may be an alias to the stack pointer. That is, if we first push something to the stack and then write to the local capability, then we may be (tricked into) overwriting our own local state. The logical relation helps by telling us what we need to ascertain or check in such scenarios to guarantee safety and preserve our invariants, but such checks may be costly and it is not clear to us whether there are practical scenarios where this might be realistic.

We also need to be careful when we receive a capability from an adversary that we want to pass on to a different (instance of the) adversary. It turns out that the logical relation again tells us when this is safe. Namely, the logical relation says that we can only pass on safe arguments. For instance, when we receive a stack pointer from an adversary, then we may at some point want to pass on part of this stack pointer to, say, a callback. In order to do so, we need to make sure the stack pointer is safe which means that, if we have revoked temporary invariants, the stack must not directly or indirectly allow access to local values that we cannot guarantee safety of. When received from an adversary, we have to consider the contents of the stack unsafe, so before we pass it on, we have to clear it, or perform a dynamic safety analysis of the stack contents and anything it points to. Clearing everything is not always desirable and a dynamic safety analysis is hard to get right and potentially expensive.

In summary, the use of local capabilities for other things than stack and return pointers is likely only possible in very specific scenarios when using our calling convention. While this is unfortunate, it is not unheard of that processors have built-in constructs that are exclusively used for handling control flow, such as, for example, the call and return instructions that exist in some instruction sets.

*Single Stack.* A single stack is a good choice for the simple capability machine presented here, because it works well with higher-order functions. An alternative to a single stack would be to have a separate stack per component. The trouble with this approach is that, with multiple stacks and local stack pointers, it is not clear how components would retrieve their stack pointer upon invocation without compromising safety. A safe approach could be to have stack pointers stored by a central, trusted stack management component, but it is not clear how that could scale to large numbers of separate components. Handling large numbers of components is a requirement if we want to use capability machines to enforce encapsulation of, for example, every object in an object-oriented program or every closure in a functional program.

### **Logical Relation**

*Single Orthogonal Closure.* The definitions of E and V in Fig. 6 apply a single orthogonal closure, a new variant of an existing pattern called biorthogonality. Biorthogonality is a pattern for defining logical relations [20,22] in terms of an observation relation of safe configurations (like we do). The idea is to define safe evaluation contexts as the set of contexts that produce safe observations when plugging safe values and define safe terms as the set of terms that can be plugged into safe evaluation contexts to produce safe observations. This is an alternative to more direct definitions where safe terms are defined as terms that evaluate to safe values. An advantage of biorthogonality is that it scales better to languages with control effects like call/cc. Our definitions can be seen as a variant of biorthogonality, where we take only a single orthogonal closure: we do not define safe evaluation contexts but immediately define safe terms as those that produce safe observations when plugged with safe values. This is natural because we model arbitrary assembly code that does not necessarily respect a particular calling convention: return pointers are in principle values like all others and there is no reason to treat them specially in the logical relation.

Interestingly, Hur and Dreyer [23] also use a step-indexed, Kripke logical relation for an assembly language (for reasoning about correct compilation from ML to assembly), but because they only model non-adversarial code that treats return pointers according to a particular calling convention, they can use standard biorthogonality rather than a single orthogonal closure like us.

*Public/Private Future Worlds.* A novel aspect of our logical relation is how we model the temporary, revokable nature of local capabilities using public/private future worlds. The main insight is that this special nature generalizes that of the syntactically-enforced unstorable status of evaluation contexts in lambda calculi without control effects (of which well-bracketed control flow is a consequence). To reason about code that relies on this (particularly, the original awkward example), Dreyer et al. [13] (DNB) formally capture the special status of evaluation contexts using Kripke worlds with public and private future world relations. Essentially, they allow relatedness of evaluation contexts to be monotone with respect to a weaker future world relation (public) than relatedness of values, formalizing the idea that it is safe to make temporary internal state modifications (private world transitions, which invalidate the continuation, but not other values) while an expression is performing internal steps, as long as the code returns to a stable state (i.e. transitions to a public future world of the original) before returning. We generalize this idea to reason about local capabilities: validity of local capabilities is allowed to be monotone with respect to a weaker future-world relation than other values, which we can exploit to distinguish between state changes that are always safe (public future worlds) and changes that are only valid if we clear all local capabilities (private future worlds). Our future world relations are similar to DNB's (for example, our proof of the awkward example uses exactly the same state transition system), but they turn up in an entirely different place in the logical relation: rather than using public future worlds for the special syntactic category of evaluation contexts, they are used in the value relation depending on the locality of the capability at hand. Additionally, our worlds are a bit more complex because, to allow local memory capabilities and write-local capabilities, they can contain (revokable) temporary regions that are only monotonous w.r.t. public future worlds, while DNB's worlds are entirely permanent.

*Local Capabilities in High-Level Languages.* We point out that local capabilities are quite similar to a feature proposed for the high-level language Scala: Osvald et al. [24]'s second-class or local values. They are a kind of values that can be provided to other code for immediate use without allowing them to be stored in a closure or reference for later use. We believe reasoning about such values will require techniques similar to what we provide for local capabilities.

### **7 Related Work**

Finally, we summarize how our work relates to previous work. We do not repeat the work we discussed in Sect. 6.

Capability machines originate with Dennis and Van Horn [7] and we refer to Levy [25] and Watson et al. [9] for an overview of previous work. The capability machine formalized in Sect. 2 is a simple but representative model, modeled mainly after the M-Machine [6] (the enter pointers resemble the M-Machine's) and CHERI [9,10] (the memory and local capabilities resemble CHERI's). The latter is a recent and relatively mature capability machine, which combines capabilities with a virtual memory approach, in the interest of backwards compatibility and gradual adoption. As discussed, our local capabilities can cross module boundaries, contrary to what is enforced by CHERI's default CCall implementation.

Plenty of other papers enforce well-bracketed control flow at a low level, but most are restricted to preventing particular types of attacks and enforce only partial correctness of control flow. This includes particularly the line of work on *control-flow integrity* [2]. Those use a quite different attacker model than us: they assume an attacker that is not able to execute code, but can overwrite arbitrary data at any time during execution (to model buffer overflows). By checking the address of every indirect jump and using memory access control to prevent overwriting code, this work enforces what they call control-flow integrity, formalized as the property that every jump will follow a legal path in the controlflow graph. As discussed in Sect. 6, such a property ignores temporal properties and seems hard to use for reasoning.

More closely related to our work are papers that use a trusted stack manager and some form of memory isolation to enforce control-flow correctness as part of a secure compilation result [26,27]. Our work differs from theirs in that we use a different form of low-level security primitive (a capability machine with local capabilities rather than a machine with a primitive notion of compartments) and we do not use a trusted stack manager, but a decentralized calling convention based on local capabilities. Also, both prove a secure compilation result from a high-level language, which clearly implies a general form of control-flow correctness, while we define a logical relation that can be used to reason about specific programs that rely on well-bracketed control flow.

Our logical relation is a unary, step-indexed Kripke logical relation with recursive worlds [16,18,20,28], closely related to the one used by Devriese et al. [11] to formulate capability safety in a high-level JavaScript-like lambda calculus. Our Fundamental Theorem is similar to theirs and expresses capability safety of the capability machine. Because we are not interested in externally observable side-effects (like console output or memory access traces), we do not require their notion of effect parametricity. Our logical relation uses several ideas from previous work, like Kripke worlds with regions containing state transition systems [15], public/private future worlds [13] (see Sect. 6 for a discussion), and biorthogonality [20,23,29].

Swasey et al. [30] have recently developed a *logic*, OCPL, for verification of object capability patterns. The logic is based on Iris [31–33], a state of the art higher-order concurrent separation logic and is formalized in Coq, building on the Iris Proof Mode for Coq [34]. OCPL gives a more abstract and modular way of proving capability safety for a lambda-calculus (with concurrency) compared to the earlier work by Devriese et al. [11].

El-Korashy also defined a formal model of a capability machine, namely CHERI, and uses it to prove a compartmentalization result [35] (not implying control-flow correctness). He also adapts control-flow integrity (see above) to the machine and shows soundness, seemingly without relying on capabilities.

**Acknowledgements.** This research was supported in part by the ModuRes Sapere Aude Advanced Grant from The Danish Council for Independent Research for the Natural Sciences (FNU). Dominique Devriese holds a Postdoctoral fellowship from the Research Foundation Flanders (FWO).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Modular Product Programs**

### Marco Eilers(B) , Peter M¨uller , and Samuel Hitz

Department of Computer Science, ETH Zurich, Zurich, Switzerland *{*marco.eilers,peter.mueller,samuel.hitz*}*@inf.ethz.ch

**Abstract.** Many interesting program properties like determinism or information flow security are hyperproperties, that is, they relate multiple executions of the same program. Hyperproperties can be verified using relational logics, but these logics require dedicated tool support and are difficult to automate. Alternatively, constructions such as selfcomposition represent multiple executions of a program by one product program, thereby reducing hyperproperties of the original program to trace properties of the product. However, existing constructions do not fully support procedure specifications, for instance, to derive the determinism of a caller from the determinism of a callee, making verification non-modular.

We present modular product programs, a novel kind of product program that permits hyperproperties in procedure specifications and, thus, can reason about calls modularly. We demonstrate its expressiveness by applying it to information flow security with advanced features such as declassification and termination-sensitivity. Modular product programs can be verified using off-the-shelf verifiers; we have implemented our approach to secure information flow using the Viper verification infrastructure.

### **1 Introduction**

The past decades have seen significant progress in automated reasoning about program behavior. In the most common scenario, the goal is to prove trace properties of programs such as functional correctness or termination. However, important program properties such as information flow security, injectivity, and determinism cannot be expressed as properties of individual traces; these socalled *hyperproperties* relate different executions of the same program. For example, proving determinism of a program requires showing that any two executions from identical initial states will result in identical final states.

An important attribute of reasoning techniques about programs is *modularity*. A technique is modular if it allows reasoning about parts of a program in isolation, e.g., verifying each procedure separately and using only the *specifications* of other procedures. Modularity is vital for scalability and to verify libraries without knowing all of their clients. Fully modular reasoning about hyperproperties thus requires the ability to formulate *relational* specifications, which relate different executions of a procedure, and to apply those specifications where the procedure is called. As an example, the statement

$$\text{if } (\times) \text{ then } \{\text{y:=} \times\} \text{ \textbf{e1}} \\ \text{se } \{\text{y:=} \text{ca11 } f(\times)\}$$

can be proved to be deterministic if f's relational specification guarantees that its result deterministically depends on its input.

Relational program logics [11,27,29] allow directly proving general hyperproperties, however, automating relational logics is difficult and requires building dedicated tools. Alternatively, self-composition [9] and product programs [6,7] reduce a hyperproperty to an ordinary trace property, thus making it possible to use off-the-shelf program verifiers for proving hyperproperties. Both approaches construct a new program that combines the behaviors of multiple runs of the original program. However, by the nature of their construction, neither approach supports modular verification based on relational specifications: Procedure calls in the original program will be duplicated, which means that there is no single program point at which a relational specification can be applied. For the aforementioned example, self-composition yields the following program:

$$\begin{aligned} \text{if } (\times) \text{ then } \{\text{y:=} \times\} \text{ \(1\text{-}s\) } & \text{if } (\times) \{\text{y:=} \text{ ca11 } f(\times)\}; \\ \text{if } (\times') \text{ then } \{\text{y':=} \text{x'}\} \text{ \(1\text{-}s\) } & \text{if } (\times') \text{ \(1\text{-}s\) } \end{aligned}$$

Determinism can now be verified by proving the trace property that identical values for x and x' in the initial state imply identical values for y and y' in the final state. However, such a proof cannot make use of a relational specification for procedure f (expressing that f is deterministic). Such a specification relates several executions of f, whereas each call in the self-composition belongs to a single execution. Instead, verification requires a *precise functional specification* of f, which *exactly* determines its result value in terms of the input. Verifying such precise functional specifications increases the verification effort and is at odds with data abstraction (for instance, a collection might not want to promise the exact iteration order); inferring them is beyond the state of the art for most procedures [28]. Existing product programs allow aligning or combining some statements and can thereby lift this requirement in some cases, but this requires manual effort during the construction, depends on the used specifications, and does not solve the problem in general.

In this paper, we present modular product programs, a novel kind of product programs that allows modular reasoning about hyperproperties. Modular product programs enable proving k-safety hyperproperties, i.e., hyperproperties that relate finite prefixes of k execution traces, for arbitrary values of k [12]. We achieve this via a transformation that, unlike existing products, does not duplicate loops or procedure calls, meaning that for any loop or call in the original program, there is exactly one statement in the k-product at which a relational specification can be applied. Like existing product programs, modular products can be reasoned about using off-the-shelf program verifiers.

We demonstrate the expressiveness of modular product programs by applying them to prove secure information flow, a 2-safety hyperproperty. We show how modular products enable proving traditional non-interference using natural and concise information flow specifications, and how to extend our approach for proving the absence of timing or termination channels, and supporting declassification in an intuitive way.

To summarize, we make the following contributions:


After giving an informal overview of our approach in Sect. 2 and introducing our programming and assertion language in Sect. 3, we formally define modular product programs in Sect. 4. We sketch a soundness proof in Sect. 5. Section 6 demonstrates how to apply modular products for proving secure information flow. We describe and evaluate our implementation in Sect. 7, discuss related work in Sect. 8, and conclude in Sect. 9.

### **2 Overview**

In this section, we will illustrate the core concepts behind modular k-products on an example program. We will first show how modular products are constructed, and subsequently demonstrate how they allow using relational specifications to modularly prove hyperproperties.

### **2.1 Relational Specifications**

Consider the example program in Fig. 1, which counts the number of female entries in a sequence of people. Now assume we want to prove that the program is deterministic, i.e., that its output state is completely determined by its input arguments. This can be expressed as a 2-safety hyperproperty which states that, for two terminating executions of the program with identical inputs, the outputs will be the same. This hyperproperty can be expressed by the *relational* (as opposed to *unary*) specification main : <sup>1</sup> people = <sup>2</sup> people <sup>1</sup> count = <sup>2</sup> count, where *i* x refers to the value of the variable x in the ith execution.

Intuitively, it is possible to prove this specification by giving is female a precise functional specification like is female : *true* res = 1 − person **mod** 2, meaning that is female can be invoked in any state and that res = 1 − person **mod** 2 will hold if it returns. From this specification and an appropriate loop invariant, main can be shown to be deterministic. However, this specification

**Fig. 1.** Example program. The parameter people contains a sequence of integers that each encode attributes of a person; the main procedure counts the number of females in this sequence.

is unnecessarily strong. For proving determinism, it is irrelevant what exactly the final value of count is; it is only important that it is uniquely determined by the procedure's inputs. Proving hyperproperties using only unary specifications, however, critically depends on having exact specifications for every value returned by a called procedure, as well as all heap locations modified by it. Not only are such specifications difficult to infer and cumbersome to provide manually; this requirement also fundamentally removes the option of underspecifying program behavior, which is often desirable in practice. Because of these limitations, verification techniques that require precise functional specifications for proving hyperproperties often do not work well in practice, as observed by Terauchi and Aiken for the case of self-composition [28].

Proving determinism of the example program becomes much simpler if we are able to reason about two program executions at once. If both runs start with identical values for people then they will have identical values for people, i, and count when they reach the loop. Since the loop guard only depends on i and people, it will either be true for both executions or false for both. Assuming that is female behaves deterministically, all three variables will again be equal in both executions at the end of the loop body. This means that the program establishes and preserves the relational loop invariant that people, i, and count have identical values in both executions, from which we can deduce the desired relational postcondition. Our modular product programs enable this modular and intuitive reasoning, as we explain next.

#### **2.2 Modular Product Programs**

Like other product programs, our modular k-product programs multiply the state space of the original program by creating k renamed versions of all original variables. However, unlike other product programs, they do *not* duplicate control structures like loops or procedure calls, while still allowing different executions to take different paths through the program.

Modular product programs achieve this as follows: The set of transitions made by the execution of a product is the union of the transitions made by

**Fig. 2.** Modular 2-product of the program in Fig. 1 (slightly simplified). Parameters and local variables have been duplicated, but control flow statements have not. All statements are parameterized by activation variables.

the executions of the original program it represents. This means that if two executions of an if-then-else statement execute different branches, an execution of the product will execute the corresponding versions of *both* branches; however, it will be aware of the fact that each branch is taken by only one of the original executions, and the transformation of the statements *inside* each branch will ensure that the state of the other execution is not modified by executing it.

For this purpose, modular product programs use boolean *activation variables* that store, for each execution, the condition under which it is currently active. All activation variables are initially true. For every statement that directly changes the program state, the product performs the state change for all active executions. Control structures update which executions are active (for instance based on the loop condition) and pass this information down (into the branches of a conditional, the body of a loop, or the callee of a procedure call) to the level of atomic statements<sup>1</sup>. This representation avoids duplicating these control structures.

Figure 2 shows the modular 2-product of the program in Fig. 1. Consider first the main procedure. Its parameters have been duplicated, there are now two copies of all variables, one for each execution. This is analogous to selfcomposition or existing product programs. In addition, the transformed procedure has two boolean parameters p1 and p2; these variables are the initial

<sup>1</sup> The information stored in activation variables is similar to a path condition in symbolic execution, which is also updated every time a branch is taken. However, they differ for loops and calls.

activation variables of the procedure. Since main is the entry point of the program, the initial activation variables can be assumed to be true.

Consider what happens when the product is run with arbitrary input values for people1 and people2. The product will first initialize i1 and i2 to zero, like it does with i in the original program, and analogously for count1 and count2.

The loop in the original program has been transformed to a single loop in the product. Its condition is true if the original loop condition is true for any active execution. This means that the loop will iterate as long as at least one execution of the original program would. Inside the loop body, the fresh activation variables l1 and l2 represent whether the corresponding executions would execute the loop body. That is, for each execution, the respective activation variable will be true if the previous activation variable (p1 or p2, respectively) is true, meaning that this execution actually reaches the loop, and the loop guard is true for that execution. All statements in the loop body are then transformed using these new activation variables. Consequently, the loop will keep iterating while at least one execution executes the loop, but as soon as the loop guard is false for any execution, its activation variable will be false and the loop body will have no effect.

Conceptually, procedure calls are handled very similarly to loops. For the call to is female in the original program, only a single call is created in the product. This call is executed if at least one activation variable is true, i.e., if at least one execution would perform the call in the original program. In addition to the (duplicated) arguments of the original call, the current activation variables are passed to the called procedure. In the transformed version of is female , all statements are then made conditional on those activation variables. Therefore, like with loops, a call in the product will be performed if at least one execution would perform it in the original program, but it will have no effect on the state of the executions that are not active when the call is made.

The transformed version of is female shows how conditionals are handled. We introduce four fresh activation variables t1, t2, f1, and f2, two for each execution. The first pair encodes whether the then-branch should be executed by either of the two executions; the second encodes the same for the else-branch. These activation variables are then used to transform the branches. Consequently, neither branch will have an effect for inactive executions, and exactly one branch has an effect for each active execution.

To summarize, our activation variables ensure that the sequence of statechanging statements executed by each execution is the same in the product and the original program. We achieve this without duplicating control structures or imposing restrictions on the control flow.

#### **2.3 Interpretation of Relational Specifications**

Since modular product programs do not duplicate calls, they provide a simple way of interpreting relational procedure specifications: If all executions call a procedure, its relational precondition is required to hold before the call and the relational postcondition afterwards. If a call is performed by some executions but not all, the relational specification are not meaningful, and thus cannot be required to hold. To encode this intuition, we transform every relational preor postcondition Qˆ of the original program into an implication (*k <sup>i</sup>*=1 <sup>p</sup>*i*) <sup>⇒</sup> <sup>Q</sup>ˆ. In the transformed version, both pre- and postconditions are made conditional on the conjunction of all activation parameters p*<sup>i</sup>* of the procedure. As a result, both will be trivially true if at least one execution is not active at the call site.

In our example, we give is female the relational specification is female : *true* <sup>1</sup> person = <sup>2</sup> person <sup>⇒</sup> <sup>1</sup> res = <sup>2</sup> res, which expresses determinism. This specification will be transformed into a unary specification of the product program: is female : p1 ∧ p2 ⇒ *true* p1 ∧ p2 ⇒ (person1 = person2 ⇒ res1 = res2).

Assume for the moment that is female also has a unary precondition person ≥ 0. Such a specification should hold for *every* call, and therefore for every active execution, even if other executions are inactive. Therefore, its interpretation in the product program is (p1 ⇒ person1 ≥ 0) ∧ (p2 ⇒ person2 ≥ 0). The translation of other unary assertions is analogous.

Note that it is possible (and useful) to give a procedure both a relational and a unary specification; in the product this is encoded by simply conjoining the transformed versions of the unary and the relational assertions.

### **2.4 Product Program Verification**

We can now prove determinism of our example using the product program. Verifying is female is simple. For main, we want to prove the transformed specification main : (p1 ∧ p2 ⇒ people1 = people2) - (p1 ∧ p2 ⇒ count1 = count2). We use the relational loop invariant <sup>1</sup> i = <sup>2</sup> <sup>i</sup> <sup>∧</sup> <sup>1</sup> count = <sup>2</sup> count <sup>∧</sup> <sup>1</sup> people = <sup>2</sup> people, encoded as p1 ∧ p2 ⇒ i1 = i2 ∧ count1 = count2 ∧ people1 = people2. The loop invariant holds trivially if either p1 or p2 is false. Otherwise, it ensures l1 = l2 and current1 = current2. Using the specification of is female , we obtain t1 = t2, which implies that the loop invariant is preserved. The loop invariant implies the postcondition.

### **3 Preliminaries**

We model our setting according to the relational logic by Banerjee, Naumann and Nikouei [5] <sup>2</sup> and, like them, use a standard Hoare logic [4] to reason about single program executions. Figure 3 shows the language we use to define modular product programs. x ranges over the set of local integer variable names Var. Note that this language is deterministic; non-determinism can for example be modelled via additional inputs, as is often done for modelling fairness in concurrent programs [16]. Program configurations have the form s, σ, where σ ∈ Σ maps variable names to values. The value of expression e in state σ is

<sup>2</sup> Our handling of procedure calls is slightly different, but amounts to restricting procedures to work only on local variables not used in the rest of the program (as opposed to having a global state on which all procedures work directly), and only interacting with the rest of the program via explicitly declared return parameters.


denoted as σ(e). The small-step transition relation for program configurations has the form s, σ→s , σ . A hypothesis context Φ maps procedure names to specifications.

The judgment Φ s : P - Q denotes that statement s, when executed in a state fulfilling the unary assertion P, will not fault, and if the execution terminates, the resulting state will fulfill the unary assertion Q. For an extensive discussion of the language and its operational and axiomatic semantics, see [5].

In addition to standard unary expressions and assertions, we define relational expressions and assertions. They differ from normal expressions and assertions in that they contain parameterized variable references of the form *<sup>i</sup>* x and are evaluated over a tuple of states instead of a single one. A relational expression is k-relational if for all contained variable references *<sup>i</sup>* x, 1 ≤ i ≤ k, and analogous for relational assertions. The value of a variable reference *<sup>i</sup>* x with 1 ≤ i ≤ k in a tuple of states (σ1,...,σ*k*) is σ*i*(x); the evaluation of arbitrary relational expressions and the validity of relational assertions (σ1,...,σ*k*) Pˆ are defined accordingly.

**Definition 1.** *A* k*-relational specification* s : Pˆ *<sup>k</sup>* Qˆ *holds iff* Pˆ *and* Qˆ *are* k*-relational assertions, and for all* σ1,...,σ*k*, σ 1,...,σ *<sup>k</sup>, if* (σ1,...,σ*k*) <sup>P</sup><sup>ˆ</sup> *and* <sup>∀</sup><sup>i</sup> ∈ {1,...,k}.s, σ*i* →<sup>∗</sup> skip, σ *<sup>i</sup>, then* (σ 1,...,σ *<sup>k</sup>*) <sup>Q</sup>ˆ*.*

We write s : Pˆ Qˆ for the most common case s : Pˆ <sup>2</sup> Qˆ.

### **4 Modular** *k***-Product Programs**

In this section, we define the construction of modular products for arbitrary k. We will subsequently define the transformation of both relational and unary specifications to modular products.

### **4.1 Product Construction**

Assume as given a function (Var, <sup>N</sup>) <sup>→</sup> Var that renames variables for different executions. We write e(*i*) for the renaming of expression e for execution i and require that <sup>∀</sup>x, y, i, j. i <sup>=</sup> <sup>j</sup> <sup>⇒</sup> <sup>x</sup>(*i*) <sup>=</sup> <sup>y</sup>(*j*) . We write fresh(x1, x2,...) to denote that the variable names x1, x2,... are fresh names that do not occur in the program and have not yet been used during the transformation. ˚e is used to abbreviate e(1),...,e(*k*).

We denote the modular k-product of a statement s that is parameterized by the activation variables p(1),...,p(*k*) as s ˚*p <sup>k</sup>*. The product construction for procedures is defined as

$$\begin{aligned} & \quad \text{[procedure } m(x\_1, \ldots, x\_m) \text{ returns } (y\_1, \ldots, y\_n) \{s\} \}\_k \\ &= \text{ [procedure } m(p^{(1)}, \ldots, p^{(k)},args) \text{ returns } (rets) \{\|s\|\_k^{\tilde{p}}\} \\ & \quad \text{where} \\ & \quad \begin{aligned} &args = x\_1^{(1)}, \ldots, x\_1^{(k)}, \ldots, x\_m^{(1)}, \ldots, x\_m^{(k)} \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \end{aligned} \end{aligned}$$

Figure 4 shows the product construction rules for statements, which generalize the transformation explained in Sect. 2. We write if (e) then {s} as a shorthand for if (e) then {s} else {skip}, and *<sup>k</sup> <sup>i</sup>*=1 s*<sup>i</sup>* for the sequential composition of k statements s1; ... ; s*k*.

The core principle behind our encoding is that statements that directly change the state are duplicated for each execution and made conditional under the respective activation variables, whereas control statements are not duplicated and instead manipulate the activation variables to pass activation information to their sub-statements. This enables us to assert or assume relational assertions before and after any statement from the original program. The only state-changing statements in our language, variable assignments, are therefore transformed to a sequence of conditional assignments, one for each execution. Each assignment is executed only if the respective execution is currently active.

Duplicating conditionals would also duplicate the calls and loops in their branches. To avoid that, modular products eliminate top-level conditionals; instead, new activation variables are created and assigned the values of the current activation variables conjoined with the guard for each branch. The branches are then sequentially executed based on their respective activation variables.

A while loop is transformed to a single while loop in the product program that iterates as long as the loop guard is true for *any* active execution. Inside the loop, fresh activation variables indicate whether an execution reaches the loop *and* its loop condition is true. The loop body will then modify the state of an execution only if its activation variable is true. The resulting construct affects the program state in the same way as a self-composition of the original loop would, but the fact that our product contains only a single loop enables us to use relational loop invariants instead of full functional specifications.

For procedure calls, it is crucial that the product contains a single call for every call in the original program, in order to be able to apply relational specifications at the call site. As explained before, initial activation parameters are added to every procedure declaration, and all parameters are duplicated k times.

**Fig. 4.** Construction rules for statement products.

Procedure calls are therefore transformed such that the values of the current activation variables are passed, and all arguments are passed once for each execution. The return values are stored in temporary variables and subsequently assigned to the actual target variables only for those executions that actually execute the call, so that for all other executions, the target variables are not affected.

The transformation wraps the call in a conditional so that the call is performed only if at least one execution is active. This prevents the transformation from introducing infinite recursion that is not present in the original program.

Note that for an inactive execution i, arbitrary argument values are passed in procedure calls, since the passed variables a*<sup>j</sup>* (*i*) are not initialized. This is unproblematic because these values will not be used by the procedure. It is important to not evaluate e*<sup>j</sup>* (*i*) for inactive executions, since this could lead to false alarms for languages where expression evaluation can fail.

#### **4.2 Transformation of Assertions**

We now define how to transform unary and relational assertions for use in a modular product.

Unary assertions such as ordinary procedure preconditions describe state properties that should hold for every single execution. When checking or assuming that a unary assertion holds at a specific point in the program, we need to take into account that it only makes sense to do so for executions that actually reach that program point. We can express this by making the assertion conditional on the activation variable of the respective execution; as a result, any unary assertion is trivially valid for all inactive executions.

A k-relational assertion, on the other hand, describes the relation between the states of all k executions. Checking or assuming a relational assertion at some point is meaningful only if *all* executions actually reach that point. This can be expressed by making relational assertions conditional on the conjunction of all current activation variables. If at least one execution does not reach the assertion, it holds trivially.

We formalize this idea by defining a function α that maps relational assertions Pˆ to unary assertions P of the product program such that α(Pˆ) = Pˆ[V ar(1)/ <sup>1</sup> V ar] ... [V ar(*k*) / *<sup>k</sup>* V ar]. Assertions can then be transformed for use in a k-product as follows:


Importantly, our approach allows using *mixed* assertions and specifications, which represent conjunctions of unary and relational assertions. For example, it is common to combine a unary precondition that ensures that a procedure will not raise an error with a relational postcondition that states that it is deterministic.

A mixed assertion <sup>R</sup><sup>ˇ</sup> of the form <sup>P</sup> <sup>∧</sup> <sup>Q</sup><sup>ˆ</sup> means that the unary assertion <sup>P</sup> holds for every single execution, and if all executions are currently active, the relational assertion Qˆ holds as well. The transformation of mixed assertions is straightforward: <sup>R</sup><sup>ˇ</sup> ˚*p <sup>k</sup>* = P ˚*p <sup>k</sup>* <sup>∧</sup>Q<sup>ˆ</sup> ˚*p k*.

#### **4.3 Heap-Manipulating Programs**

The approach outlined so far can easily be extended to programs that work on a mutable heap, assuming that object references are opaque, i.e., they cannot be inspected or used in arithmetic. In order to create a distinct state space for each execution represented in the product, allocation statements are duplicated and made conditional like assignments, and therefore create a different object for each active execution. The renaming of a field dereference e.f is then defined as e(*i*).f. As a result, the heap of a k-product will consist of k partitions that do not contain references to each other, and execution i will only ever interact with objects from its partition of the heap.

The verification of modular products of heap-manipulating programs does not depend on any specific way of achieving framing. Our implementation is based on implicit dynamic frames [25], but other approaches are feasible as well, provided that procedures can be specified in such a way that the caller knows the heap stays unmodified for all executions whose activation variables are false.

Since the handling of the heap is largely orthogonal to our main technique, we will not go into further detail here, but we do support heap-manipulating programs in our implementation.

### **5 Soundness and Completeness**

A product construction is sound if an execution of a k-product mirrors k separate executions of the original program such that properties proved about the product entail hyperproperties of the original program. In this section, we sketch a soundness proof of our k-product construction in the presence of only unary procedure specifications. We also sketch a proof for relational specifications for the case k = 2, making use of the relational logic presented by Banerjee et al. [5]. Finally, we informally discuss the completeness of modular products.

### **5.1 Soundness with Unary Specifications**

A modular k-product must soundly encode k executions of the original program. That is, if an encoded unary specification holds for a product program then the original specification holds for the original program.

We define a relation σ *<sup>i</sup>* σ that denotes that σ contains a renamed version of all variables in σ , i.e., ∀v ∈ dom(σ ) : σ(v(*i*) ) = σ (v). Without the index i, denotes the same but without renaming, and is used to express equality modulo newly introduced activation variables.

**Theorem 1.** *Assume that for all procedures* m *in a hypothesis context* Φ *we have that* m : S - T ∈ dom(Φ) *if and only if* m : S ˚*p <sup>k</sup>* - T ˚*p <sup>k</sup>* ∈ dom(Φ )*. Then* Φ s ˚*p <sup>k</sup>* : P ˚*p <sup>k</sup>* - Q ˚*p <sup>k</sup> implies that* Φ s : P -Q*.*

*Proof (Sketch).* We sketch a proof based on the operational semantics of our language. We show that the execution of the product program with exactly one active execution corresponds to a single execution of the original program.

Assume that Φ s ˚*p <sup>k</sup>* : P ˚*p <sup>k</sup>* - Q ˚*p <sup>k</sup>*, and that σ P ˚*p <sup>k</sup>*. If s ˚*p <sup>k</sup>* does not diverge when executed from σ we have that s ˚*p <sup>k</sup>*, σ →<sup>∗</sup> skip, σ and σ Q ˚*p <sup>k</sup>*. We now prove that a run of the product with all but one execution being inactive reflects the states that occur in a run of the original program. Assume that <sup>σ</sup> <sup>p</sup>(1)<sup>∧</sup> *k <sup>i</sup>*=2(¬p(*i*)) and s, σ1 →<sup>∗</sup> skip, σ <sup>1</sup> and initially σ <sup>1</sup> σ1, which implies <sup>σ</sup><sup>1</sup> <sup>P</sup>. We prove by induction on the derivation of s, σ1 →<sup>∗</sup> skip, σ 1 that s ˚*p <sup>k</sup>*, σ →<sup>∗</sup> skip, σ and σ <sup>1</sup> σ <sup>1</sup>, meaning that the product execution terminates, and subsequently by induction on the derivation of s ˚*p k*, σ →<sup>∗</sup> skip, σ that σ <sup>1</sup> σ <sup>1</sup>, from which we can derive that σ <sup>1</sup> Q.

#### **5.2 Soundness for Relational Specifications**

The main advantage of modular product programs over other kinds of product programs is that it allows reasoning about procedure calls in terms of relational specifications. We therefore need to show the soundness of our approach in the presence of procedures with such specifications. In particular, we must establish that if a transformed relational specification holds for a modular product then the original relational specification will hold for a set of k executions of the original program.

Our proof sketch is phrased in terms of *biprograms* as introduced by Banerjee et al. [5]. Biprogram executions correspond to two partly aligned executions of their two underlying programs. A biprogram ss can have the form (s1|s2) or s; the former represents the two executions of s<sup>1</sup> and s2, whereas the latter represents an aligned execution of s by both executions, which enables using relational specifications for procedure calls<sup>3</sup>. We denote the small-step transition relation between biprogram configurations as ss, σ1|σ2 <sup>∗</sup> ss , σ 1|σ <sup>2</sup>. We make use of a relation σ σ1|σ<sup>2</sup> that denotes that σ contains renamed versions of all variables in both σ<sup>1</sup> and σ<sup>2</sup> with the same values.

Biprograms do not allow mixed procedure specifications, meaning that a procedure can either have only a unary specification, or it can have only a relational specification, in which case it can only be invoked by both executions simultaneously. As mentioned before, our approach does not have this limitation, but we can artificially enforce it for the purposes of the soundness proof.

We can now state our theorem. Since biprograms represent the execution of two programs, we formulate soundness for k = 2 here.

**Theorem 2.** *Assume that hypothesis context* Φ *maps procedure names to relational specifications if all calls to the procedure in* s *can be aligned from any pair of states satisfying* Pˆ*, and to unary specifications otherwise. Assume further that hypothesis context* Φ *maps the same procedure names to their transformed specifications. Finally, assume that* Φ s ˚*p* <sup>2</sup> : <sup>P</sup><sup>ˆ</sup> ˚*p* <sup>2</sup> - <sup>Q</sup><sup>ˆ</sup> ˚*p* <sup>2</sup> *and* (σ1, σ2) <sup>P</sup>ˆ*. If* s, σ1 →<sup>∗</sup> skip, σ <sup>1</sup> *and* s, σ2 →<sup>∗</sup> skip, σ <sup>2</sup>*, then* (σ 1, σ <sup>2</sup>) <sup>Q</sup>ˆ*.*

*Proof (Sketch).* The proof follows the same basic outline as the one for Theorem 1 but reasons about the operational semantics of biprograms representing two executions of s.

Assume that Φ s ˚*p* <sup>2</sup> : <sup>P</sup><sup>ˆ</sup> ˚*p* <sup>2</sup> - <sup>Q</sup><sup>ˆ</sup> ˚*p* <sup>2</sup> and <sup>σ</sup> <sup>P</sup><sup>ˆ</sup> ˚*p* <sup>2</sup>. If s ˚*p* <sup>2</sup> does not diverge when executed from σ we get that s ˚*p* <sup>2</sup>, σ →<sup>∗</sup> skip, σ and <sup>σ</sup> <sup>Q</sup><sup>ˆ</sup> ˚*p* <sup>2</sup>. Assume that initially σ <sup>σ</sup>1|σ2, which implies that (σ1, σ2) <sup>P</sup>ˆ. We prove by induction on the derivation of s ˚*p* <sup>2</sup>, σ →<sup>∗</sup> skip, σ that (1) if <sup>σ</sup> <sup>p</sup>(1) <sup>∧</sup> <sup>p</sup>(2), then there exists ss that represents two executions of <sup>s</sup> s.t. ss, σ1|σ2 <sup>∗</sup> skip, σ 1|σ 2 and σ σ 1|σ <sup>2</sup>; (2) if <sup>σ</sup> <sup>p</sup>(1) ∧ ¬p(2), then s, σ1 →<sup>∗</sup> skip, σ <sup>1</sup> and <sup>σ</sup> - σ <sup>1</sup>|σ2; (3) if <sup>σ</sup> <sup>¬</sup>p(1) <sup>∧</sup> <sup>p</sup>(2), then s, σ2 →<sup>∗</sup> skip, σ <sup>2</sup> and <sup>σ</sup> σ1|σ <sup>2</sup>; (4) if <sup>σ</sup> <sup>¬</sup>p(1) ∧ ¬p(2), then <sup>σ</sup> <sup>σ</sup> . From the first point and semantic consistency

<sup>3</sup> We modified the original notation to avoid clashes with our own concepts introduced earlier.

of the relational logic, we can conclude that (σ 1, σ <sup>2</sup>) <sup>Q</sup>ˆ. Finally, we prove that s ˚*p* <sup>2</sup>, σ →<sup>∗</sup> skip, σ by showing that non-termination of the product implies the non-termination of at least one of the two original program runs. If the condition of a loop in the product remains true forever, the loop condition of at least one encoded execution must be true after every iteration. We show that (1) this is not due to an interaction of multiple executions, since the condition for every execution will remain false if it becomes false once, and (2) since the encoded states of active executions progress as they do in the original program, the condition of a single execution in the product remains true forever only if it does in the original program. A similar argument shows that the product cannot diverge because of infinite recursive calls.

### **5.3 Completeness**

We believe modular product programs to be complete, meaning that any hyperproperty of multiple executions of a program can be proved about its modular product program. Since the product faithfully models the executions of the original program, the completeness of modular products is potentially limited only by the underlying verification logic and the assertion language, but not by the product construction itself.

### **6 Modular Verification of Secure Information Flow**

In this section, we demonstrate the expressiveness of modular product programs by showing how they can be used to verify an important hyperproperty, information flow security. We first concentrate on secure information flow in the classical sense [9], and later demonstrate how the ability to check relational assertions at any point in the program can be exploited to prove advanced properties like the absence of timing and termination channels, and to encode declassification.

### **6.1 Non-interference**

Secure information flow, i.e., the property that secret information is not leaked to the public outputs of a program, can be expressed as a relational 2-safety property of a program called *non-interference*. Non-interference states that, if a program is run twice, with the public (often called *low*) inputs being equal in both runs but the secret (or *high*) inputs possibly being different, the public outputs of the program must be equal in both runs [8]. This property guarantees that the high inputs do not influence the low outputs.

We can formalize non-interference as follows:

**Definition 2.** *A statement* s *that operates on a set of variables* X = {x1,...,x*n*}*, of which some subset* X*<sup>l</sup>* ⊆ X *is low, satisfies non-interference iff for all* σ1, σ<sup>2</sup> *and* σ 1, σ <sup>2</sup>*, if* <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>X</sup>*l*. σ1(x) = <sup>σ</sup>2(x) *and* s, σ1 →<sup>∗</sup> skip, σ 1 *and* s, σ2 →<sup>∗</sup> skip, σ <sup>2</sup> *then* ∀x ∈ X*l*.σ <sup>1</sup>(x) = σ 2(x)*.*

Since our definition of non-interference describes a hyperproperty, we can verify it using modular product programs:

**Theorem 3.** *A statement* s *that operates on a set of variables* X = {x1,...,x*n*}*, of which some subset* X*<sup>l</sup>* ⊆ X *is low, satisfies non-interference under a unary precondition* P *if* Φ s ˚*p* <sup>2</sup> : P ˚*p* <sup>2</sup> <sup>∧</sup> (∀<sup>x</sup> <sup>∈</sup> <sup>X</sup>*l*. x(1) <sup>=</sup> <sup>x</sup>(2)) - ∀x ∈ X*l*. x(1) = x(2)

*Proof (Sketch).* Since non-interference can be expressed using a 2-relational specification, the theorem follows directly from Theorem 2.

For non-deterministic programs whose behavior can be modelled by adding input parameters representing the non-deterministic choices, those parameters can be considered low if the choice is not influenced in any way by secret data.

An expanded notion of secure information flow considers observable *events* in addition to regular program outputs [17]. An event is a statement that has an effect that is visible to an outside observer, but may not necessarily affect the program state. The most important examples of events are output operations like printing a string to the console or sending a message over a network. Programs that cause events can be considered information flow secure only if the sequence of produced events is not influenced by high data. One way to verify this using our approach is to track the sequence of produced events in a ghost variable and verify that its value never depends on high data. This approach requires substantial amounts of additional specifications.

Modular product programs offer an alternative approach for preventing leaks via events, since they allow formulating assertions about the relation between the activation variables of different executions. In particular, if a given event has the precondition that all activation variables are equal when the event statement is reached then this event will either be executed by both executions or be skipped by both executions. As a result, the sequence of events produced by a program will be equal in all executions.

### **6.2 Information Flow Specifications**

The relational specifications required for modularly proving non-interference with the previously described approach have a specific pattern: they can contain functional specifications meant to be valid for both executions (e.g., to make sure both executions run without errors), they may require that some information is low, which is equivalent to the two renamings of the same expression being equal, and, in addition, they may assert that the control flow at a specific program point is low.

We therefore introduce modular *information flow specifications*, which can express all properties required for proving secure information flow but are transparent w.r.t. the encoding or the verification methodology, i.e., they allow expressing that a given operation or value must not be secret without knowledge of the encoding of this fact into an assertion about two different program executions. We define information flow specifications as follows:

$$(SIFAssections)\ \tilde{P} ::= \tilde{P} \land \tilde{P} \mid e \mid low(e) \mid lowEvent \mid \tilde{P} \Rightarrow \tilde{P} \mid \forall x.\ \tilde{P}$$

*low*(e) and *lowEvent* may be used on the left side of an implication only if the right side has the same form. *low*(e) specifies that the value of the expression e is not influenced by high data. Note that e can be any expression and is not limited to variable references; this reflects the fact that our approach can label secrecy in a more fine-grained way than, e.g., a type system. One can, for example, declare to be public whether a number is odd while keeping its value secret.

$$\begin{array}{lcl} \left[e^{\left[1\right]^{\mathfrak{j}}} &= \left(p^{(1)} \Rightarrow e^{(1)}\right) \wedge \left(p^{(2)} \Rightarrow e^{(2)}\right) \\ \left[low(e)\right]^{\mathfrak{j}} &= \left(p^{(1)} \land p^{(2)} \Rightarrow e^{(1)} = e^{(2)}\right) \\ \left[lowEvent\right]^{\mathfrak{j}} = p^{(1)} = p^{(2)} \\ \left[\tilde{P}\_{1} \land \tilde{P}\_{2}\right]^{\mathfrak{j}} &= \left[\tilde{P}\_{1}\right]^{\mathfrak{j}} \land \left[\tilde{P}\_{2}\right]^{\mathfrak{j}} \\ \left[\tilde{P}\_{1} \Rightarrow \tilde{P}\_{2}\right]^{\mathfrak{j}} &= \left[\tilde{P}\_{1}\right]^{\mathfrak{j}} \Rightarrow \left[\tilde{P}\_{2}\right]^{\mathfrak{j}} \\ \left[\forall x, \tilde{P}\right]^{\mathfrak{j}} &= \forall x^{(1)}, x^{(2)}, x^{(1)} = x^{(2)} \Rightarrow \left[\tilde{P}\right]^{\mathfrak{j}} \end{array} \end{array}$$

**Fig. 5.** Translation of information flow specifications.

*lowEvent* specifies that high data must not influence if and how often the current program point is reached by an execution, which is a sufficient precondition of any statement that causes an observable event. In particular, if a procedure outputs an expression e, the precondition *lowEvent* ∧ *low*(e) guarantees that no high information will be leaked via this procedure.

Information flow specifications can express complex properties. e<sup>1</sup> ⇒ *low*(e2), for example, expresses that if e<sup>1</sup> is true, e<sup>2</sup> must not depend on high data; e<sup>1</sup> ⇒ *lowEvent* says the same about the current control flow. A possible use case for these assertions is the precondition of a library function that prints e<sup>2</sup> to a low-observable channel if e<sup>1</sup> is true, and to a secure channel otherwise.

The encoding P˜˚*<sup>p</sup>* of an information flow assertion <sup>P</sup>˜ under the activation variables p(1) and p(2) is defined in Fig. 5. Note that high-ness of some expression is not modelled by its renamings being definitely unequal, but by leaving underspecified whether they are equal or not, meaning that high-ness is simply the absence of the knowledge of low-ness. As a result, it is never necessary to specify explicitly that an expression is high. This approach (which is also used in self-composition) is analogous to the way type systems encode security levels, where low is typically a subtype of high. For the example in Fig. 1, a possible, very precise information flow specification could say that the results of main are low if the first bit of all entries in people is low. We can write this as main : *low*(|people|)∧ ∀i ∈ {0,..., |people| −1}. *low*(people[ i ] **mod** 2) *low*(count). In the product, this will be translated to main : p1∧p2 ⇒ |people1| = |people2|∧∀i ∈ {0,..., |people1| − 1}.(people1[ i ] **mod** 2)=(people2[ i ] **mod** 2) count1 = count2.

In this scenario, the loop in main could have the simple invariant *low*(i) ∧ *low*(count), and the procedure is female could have the contract is female : *true* -(*low*(person **mod** 2) ⇒ *low*(res)). This contract follows a useful pattern

**Fig. 6.** Password check example: leaking secret data is desired.

where, instead of requiring an input to be low and promising that an output will be low for all calls, the output is decribed as *conditionally* low based on the level of the input, which is more permissive for callers.

The example shows that the information relevant for proving secure information flow can be expressed concisely, without requiring any knowledge about the methodology used for verification. Modular product programs therefore enable the verification of the information flow security of main based solely on modular, relational specifications, and without depending on functional specifications.

### **6.3 Secure Information Flow with Arbitrary Security Lattices**

The definition of secure information flow used in Definition 2 is a special case in which there are exactly two possible classifications of data, high and low. In the more general case, classifications come from an arbitrary lattice L, of security levels s.t. for some l1, l<sup>2</sup> ∈ L, information from an input with level l<sup>1</sup> may influence an output with level l<sup>2</sup> only if l<sup>1</sup> l2. Instead of the specification *low*(e), information flow assertions can therefore have the form levelBelow(e, l), meaning that the security level of expression e is at most l.

It is well-known that techniques for verifying information flow security with two levels can conceptually be used to verify programs with arbitrary finite security lattices [23] by splitting the verification task into |L| different verification tasks, one for each element of L. Instead, we propose to combine all these verification tasks into a single task by using a symbolic value for l, i.e., declaring an unconstrained global constant representing l. Specifications can then be translated as follows:

$$levelBelow(e, l') \hat{=} \; l' \sqsubseteq l \Rightarrow e^{(1)} = e^{(2)}$$

Since no information about l is known, verification will only succeed if all assertions can be proven for all possible values of l, which is equivalent to proving them separately for each possible value of l.

### **6.4 Declassification**

In practice, non-interference is too strong a property for many use cases. Often, some leakage of secret data is required for a program to work correctly. Consider

**Fig. 7.** Programs with a termination channel (left), and a timing channel (right). In both cases, h is high.

the case of a password check (see Fig. 6): A secret internal password is compared to a non-secret user input. While the password itself must not be leaked, the information whether the user input matches the password should influence the public outcome of the program, which is forbidden by non-interference.

To incorporate this intention, the relevant part of the secret information can be *declassified* [24], e.g., via a declassification statement declassify <sup>e</sup> that declares an arbitrary expression e to be low. With modular products, declassification can be encoded via a simple assumption stating that, if the declassification is executed in both executions, the expression is equal in both executions:

$$\|\mathsf{dec1assify }e\|\_2^{\vec{p}} = \mathsf{assume } (p^{(1)} \land p^{(2)}) \Rightarrow e^{(1)} = e^{(2)}$$

Introducing an assumption of this form is sound if the information flow specifications from Sect. 6.2 are used to specify the program. Since high-ness is encoded as the absence of the knowledge that an expression is equal in both executions, not by the knowledge that they are different, there is no danger that assuming equality will contradict current knowledge and thereby cause unsoundness. As in the information flow specifications, the declassified expression can be arbitrarily complex, so that it is for example possible to declassify the sign of an integer while keeping all other information about it secret.

The example in Fig. <sup>6</sup> becomes valid if we add declassify result at the end of the procedure, or if we declassify a more complex expression by adding declassify equal(password, input) at some earlier point. The latter would arguably be safer because it specifies exactly the information that is intended to be leaked, and would therefore prevent accidentally leaking more if the implementation of the checking loop was faulty.

This kind of declassification has the following interesting properties: First, it is *imperative*, meaning that the declassified information may be leaked (e.g., via a **print** statement) after the execution of the declassification statement, but not before. Second, it is *semantic*, meaning that the declassification affects the value of the declassified expression as opposed to, e.g., syntactically the declassified variable. As a result, it will be allowed to leak any expression whose value contains the same (or a part of the) secret information which was declassified, e.g., the expression f(e) if f is a deterministic function and e has been declassified.

### **6.5 Preventing Termination Channels**

In Definition 2, we have considered only terminating program executions. In practice, however, termination is a possible side-channel that can leak secret information to an outside observer. Figure 7 (left) shows an example of a program that verifies under the methodology presented so far, but leaks information about the secret input h to an observer: If h is initially negative, the program will enter an endless loop. Anyone who can observe the termination behavior of the program can therefore conclude if h was negative or not.

To prevent leaking information via a termination side channel, it is necessary to verify that the termination of a program depends only on public data. We will show that modular product programs are expressive enough to encode and check this property. We will focus on preventing non-termination caused by infinite loops here; preventing infinite recursion works analogously. In particular, we want to prove that if a loop iterates forever in one execution, any other execution with the same low inputs will also reach this loop and iterate forever. More precisely, this means that


We propose to verify these properties by requiring additional specifications that state, for every loop, an exact condition under which it terminates. This condition may neither over- nor underapproximate the termination behavior; the loop must terminate if and only if the condition is true. For Fig. 7 (left) the condition is h ≥ 0. We also require a ranking function for the cases when the termination condition is true. We can then prove the following:


**Fig. 8.** Program instrumentation for termination leak prevention. We abbreviate while (*e*) terminates(*ec, er*) do *{s}* as *w*.

We introduce an annotated while loop while (e) terminates(e*c*, e*r*) do {s}, where e*<sup>c</sup>* is the exact termination condition and e*<sup>r</sup>* is the ranking function, i.e., an integer expression whose value decreases with every loop iteration but never becomes negative if the termination condition is true. Based on these annotations, we present a program instrumentation *term* (s, c) that inserts the checks outlined above for every while loop in s. c is the termination condition of the outside scope, i.e., for the instrumentation of a nested loop, it is the termination condition e*<sup>c</sup>* of the outer loop. The instrumentation is defined for annotated while loops in Fig. 8; for all other statements, it does not make any changes except instrumenting all substatements. The instrumentation uses information flow assertions as defined in Sect. 6.2. Again, we make use of the fact that modular products allow checking relational assertions at arbitrary program points and formulating assertions about the control flow.

We now prove that if an instrumented statement verifies under some 2 relational precondition then any two runs from a pair of states fulfilling that precondition will either both terminate or both loop forever.

**Theorem 4.** *If* s = *term*(s, *false*)*, and* s ˚*p* <sup>2</sup> *verifies under some precondition* <sup>P</sup> <sup>=</sup> <sup>P</sup><sup>ˆ</sup> ˚*p* <sup>2</sup>*, and for some* σ1, σ2, σ <sup>1</sup>*,* (σ1, σ2) <sup>P</sup><sup>ˆ</sup> *and* s, σ1 →<sup>∗</sup> skip, σ <sup>1</sup>*, then there exists some* σ <sup>2</sup> *s.t.* s, σ2 →<sup>∗</sup> skip, σ 2*.*

*Proof (Sketch).* We first establish that our instrumentation ensures that each statement terminates (1) if and (2) only if its termination condition is true, (1) by showing equivalence to a standard termination proof, and (2) by a contradiction if a loop which should not terminate does. Since the execution from σ<sup>1</sup> terminates, by the second condition, its termination condition must have been true before the loop. We case split on whether the other execution also reaches the loop or not. If it does then the termination condition before the loop is identical in both executions, so by the first condition, the other execution also terminates. If it does not then the loop is not executed at all by the other execution, and therefore cannot cause non-termination.

### **6.6 Preventing Timing Channels**

A program has a *timing channel* if high input data influences the program's execution time, meaning that an attacker who can observe the time the program executes can gain information about those secrets. Timing channels can occur in combination with observable events; the time at which an event occurs may depend on a secret even if the overall execution time of a program does not.

Consider the example in Fig. 7 (right). Assuming main receives a positive secret h, both the **print** statement and the end of the program execution will be reached later for larger values of h.

Using modular product programs, we can verify the absence of timing side channels by adding ghost state to the program that tracks the time passed since the program has started; this could, for example, be achieved via a simple step counting mechanism, or by tracking the sequence of previously executed bytecode statements. This ghost state is updated separately for both executions. We can then assert anywhere in the program that the passed time does not depend on high data in the same way we do for program variables. In particular, we can enforce that the passed time is equal whenever an observable event occurs, and we can enable users to write relational specifications that compare the time passed in both executions of a loop or a procedure.

### **7 Implementation and Evaluation**

We have implemented our approach for secure information flow in the Viper verification infrastructure [22] and applied it to a number of example programs from the literature. Both the implementation and examples are available at http:// viper.ethz.ch/modularproducts/.

### **7.1 Implementation in Viper**

Our implementation supports a version of the Viper language that adds the following features:


The implementation transforms a program in this extended language into a modular 2-product in the original language, which can then be verified by the (unmodified) Viper back-end verifiers. All specifications are provided as information flow specifications (see Sect. 6.2) such that users require no knowledge about the transformation or the methodology behind information flow verification. Error messages are automatically translated back to the original program.

Declassification is implemented as described in Sect. 6.4. Our implementation optionally verifies the absence of timing channels; the metric chosen for tracking execution time is simple step-counting. Viper uses implicit dynamic frames [25] to reason about heap-manipulating programs; our implementation uses quantified permissions [21] to support unbounded heap data structures.

For languages with opaque object references, secure information flow can require that pointers are low, i.e., equal up to a consistent renaming of addresses. Therefore, our approach to duplicating the heap state space in the implementation differs from that described in Sect. 4.3: Instead of duplicating objects, our implementation creates a single new statement for every new in the original program, but duplicates the fields each object has. As a result, if both executions execute the same new statement, the newly created object will be considered low afterwards (but the values of its fields might still be high).

### **7.2 Qualitative Evaluation**

We have evaluated our implementation by verifying a number of examples in the extended Viper language. The examples are listed in Table 1 and include all code snippets shown in this paper as well as a number of examples from the literature [2,3,6,13,14,17,18,23,26,28]. They combine complex language features like mutable state on the heap, arrays and procedure calls, as well as timing and termination channels, declassification, and non-trivial information flows (e.g., flows whose legality depends on semantic information not available in a standard information flow type system). We manually added pre- and postconditions as well as loop invariants; for those that have forbidden flows and therefore should not verify, we also added a legal version that declassifies the leaked information. Our implementation returns the correct result for all examples.

In all cases but one, our approach allows us to express all information flow related assertions, i.e., procedure specifications and loop invariants, purely as relational specifications in terms of *low*-assertions (see Table 1). For all these examples, we completely avoid the need to specify the functional behavior of the program. Unlike the original product program paper [6], we also do not inline any procedure calls; verification is completely modular.

The only exception is an example that, depending on a high input, executes different loops with identical behavior, and for which we need to prove that the execution time is low. In this case we have to provide invariants for both loops that exactly specify their execution time in order to prove that the overall execution time after the conditional is low. Nevertheless, the specification of the procedure containing the loop is again expressed with a relational specification using only *low*. For all other examples, unary specifications were only needed to verify the absence of runtime errors (e.g., out-of-bounds array accesses), which Viper verifies by default. Consequently, a verified program cannot leak low data through such errors, which is typically not guaranteed by type systems or static analyses.


**Table 1.** Evaluated examples. We show the used language features, lines of code including specifications, overall lines used for specifications (Ann), unary specifications for safety (SF), relational specifications for non-interference (NI), specifications for termination (TM), and functional specifications required for non-interference (F). Note that some lines contain specifications belonging to multiple categories. Columns *TSE* and *TV CG* show the running times of the verifiers for the SE backend and the VCG backend, respectively, in seconds.

### **7.3 Performance**

For all but one example, the runtime (averaged over 10 runs on a Lenovo ThinkPad T450s running Ubuntu) with both the Symbolic Execution (SE) and the Verification Condition Generation (VCG) verifiers is under or around one second (see Table 1). The one exception, which makes extensive use of unbounded heap data structures, takes ca. five seconds when verified using VCG, and 15 in the SE verifier. This is likely a result of inefficiencies in our encoding: The created product has a high number of branching statements, and some properties have to be proved more than once, two issues which have a much larger performance impact for SE than for VCG. We believe that it is feasible to remove much of this overhead by optimizing the encoding; we leave this as future work.

### **8 Related Work**

The notion of k-safety hyperproperties was originally introduced by Clarkson and Schneider [12]. Here, we focus on statically proving hyperproperties for imperative and object-oriented programs; much more work exists for testing or monitoring hyperproperties like secure information flow at runtime, or for reasoning about hyperproperties in different programming paradigms.

Relational logics such as Relational Hoare Logic [11], Relational Separation Logic [29] and others [1,10] allow reasoning directly about relational properties of two different program executions. Unlike our approach, they usually allow reasoning about the executions of two *different* programs; as a result, they do not give special support for two executions of the same program calling the same procedure with a relational specification. Recently, Banerjee et al. [5] introduced biprograms, which allow explicitly expressing alignment between executions and using relational specifications to reason about aligned calls; however, this approach requires that procedures with relational specifications are always called by both executions, which is for instance not the case if a call occurs under a high guard in secure information flow verification. We handle such cases by interpreting relational specifications as trivially true; one can then still resort to functional specifications to complete the proof. Their work also does not allow mixed specifications, which are easily supported in our product programs. Relational program logics are generally difficult to automate. Recent work by Sousa and Dillig [27] presents a logic that can be applied automatically by an algorithm that implicitly constructs different product programs that align *some* identical statements, but does not fully support relational specifications. Moreover, their approach requires dedicated tool support, whereas our modular product programs can be verified using off-the-shelf tools.

The approach of reducing hyperproperties to ordinary trace properties was introduced by self-composition [9]. While self-composition is theoretically complete, it does not allow modular reasoning with relational specifications. The resulting problem of having to fully specify program behavior was pointed out by Terauchi and Aiken [28]; since then, there have been a number of different attempts to solve this problem by allowing (parts of) programs to execute in lock-step. Terauchi and Aiken [28] did this for secure information flow by relying on information from a type system; other similar approaches exist [23].

Product programs [6,7] allow different interleavings of program executions. The initial product program approach [6] would in principle allow the use of relational specifications for procedure calls, but only under the restriction that both program executions always follow the same control flow. The generalized approach [7] allows combining different programs and arbitrary numbers of executions. This product construction is non-deterministic and usually interactive. In some (but not all) cases, programmers can manually construct product programs that avoid duplicated calls and loops and thereby allow using relational specifications. However, whether this is possible depends on the used specification, meaning that the product construction and verification are intertwined and a new product has to be constructed when specifications change. In contrast, our new product construction is fully deterministic and automatic, allows arbitrary control flows while still being able to use relational specifications for all loops and calls, and therefore avoids the issue of requiring full functional specifications.

Considerable work has been invested into proving specific hyperproperties like secure information flow. One popular approach is the use of type systems [26]; while those are modular and offer good performance, they overapproximate possible program behaviors and are therefore less precise than approaches using logics. In particular, they require labeling any single value as either high or low, and do not allow distinctions like the one we made for the example in Fig. 1, where only the first bits of a sequence of integers were low. In addition, type systems typically struggle to prevent information leaks via side channels like termination or program aborts. There have been attempts to create type systems that handle some of these limitations (e.g. [15]).

Static analyses [2,17] enable fully automatic reasoning. They are typically not modular and, similarly to type systems, need to abstract semantic information, which can lead to false positives. They strike a trade-off different from our solution, which requires specifications, but enables precise, modular reasoning.

A number of logic-based approaches to proving specific hyperproperties exist. As an example, Darvas et al. use dynamic logic for proving non-interference [14]; this approach offers some automation, but requires user interaction for most realistic programs. Leino et al. [19] verify determinism up to equivalence using self-composition, which suffers from the drawbacks explained above.

Different kinds of declassification have been studied extensively, Sabelfeld and Sands [24] provide a good overview. Li and Zdancewic [20] introduce downgrading policies that describe which information can be declassified and, similar to our approach, can do so for arbitrary expressions.

### **9 Conclusion and Future Work**

We have presented modular product programs, a novel form of product programs that enable modular reasoning about k-safety hyperproperties using relational specifications with off-the-shelf verifiers. We showed that modular products are expressive enough to handle advanced aspects of secure information flow verification. They can prove the absence of termination and timing side channels and encode declassification. Our implementation shows that our technique works in practice on a number of challenging examples from the literature, and exhibits good performance even without optimizations.

For future work, we plan to infer relational properties by using standard program analysis techniques on the products. We also plan to generalize our technique to prove probabilistic secure information flow for concurrent program by combining our encoding with ideas from concurrent separation logic. Finally, we plan to optimize our encoding to further improve performance.

**Acknowledgements.** We would like to thank Toby Murray and David Naumann for various helpful discussions. We are grateful to the anonymous reviewes for their valuable comments. We also gratefully acknowledge support from the Zurich Information Security and Privacy Center (ZISC).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Program Verification

## **A Fistful of Dollars: Formalizing Asymptotic Complexity Claims via Deductive Program Verification**

Arma¨el Gu´eneau<sup>1</sup>, Arthur Chargu´eraud1,2, and Fran¸cois Pottier1(B)

<sup>1</sup> Inria, Paris, France francois.pottier@inria.fr <sup>2</sup> Universit´e de Strasbourg, CNRS, ICube UMR 7357, Strasbourg, France

**Abstract.** We present a framework for simultaneously verifying the functional correctness and the worst-case asymptotic time complexity of higher-order imperative programs. We build on top of Separation Logic with Time Credits, embedded in an interactive proof assistant. We formalize the O notation, which is key to enabling modular specifications and proofs. We cover the subtleties of the multivariate case, where the complexity of a program fragment depends on multiple parameters. We propose a way of integrating complexity bounds into specifications, present lemmas and tactics that support a natural reasoning style, and illustrate their use with a collection of examples.

### **1 Introduction**

A program or program component whose functional correctness has been verified might nevertheless still contain complexity bugs: that is, its performance, in some scenarios, could be much poorer than expected.

Indeed, many program verification tools only guarantee partial correctness, that is, do not even guarantee termination, so a verified program could run forever. Some program verification tools do enforce termination, but usually do not allow establishing an explicit complexity bound. Tools for automatic complexity inference can produce complexity bounds, but usually have limited expressive power.

In practice, many complexity bugs are revealed by testing. Some have also been detected during ordinary program verification, as shown by Filliˆatre and Letouzey [14], who find a violation of the balancing invariant in a widelydistributed implementation of binary search trees. Nevertheless, none of these techniques can guarantee, with a high degree of assurance, the absence of complexity bugs in software.

To illustrate the issue, consider the binary search implementation in Fig. 1. Virtually every modern software verification tool allows proving that this OCaml

This research was partly supported by the French National Research Agency (ANR) under the grant ANR-15-CE25-0008.

c The Author(s) 2018

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 533–560, 2018.

https://doi.org/10.1007/978-3-319-89884-1\_19

code (or analogous code, expressed in another programming language) satisfies the specification of a binary search and terminates on all valid inputs. This code might even pass a lightweight testing process, as some search queries will be answered very quickly, even if the array is very large. Yet, a more thorough testing process would reveal a serious issue: a search for a value that is stored in the second half of the range [i, j) takes linear time. It would be embarrassing if such faulty code was deployed, as it would aggravate benevolent users and possibly allow malicious users to mount denial-of-service attacks.

```
(* Requires t to be a sorted array of integers.
   Returns k such that i <= k < j and t.(k) = v
   or -1 if there is no such k. *)
let rec bsearch t v i j =
  if j <= i then -1 else
    let k = i + (j - i) / 2 in
    if v = t.(k) then k
    else if v < t.(k) then bsearch t v i k
    else bsearch t v (i+1) j
```
**Fig. 1.** A flawed binary search. This code is provably correct and terminating, yet exhibits linear (instead of logarithmic) time complexity for some input parameters.

As illustrated above, complexity bugs can affect execution time, but could also concern space (including heap space, stack space, and disk space) or other resources, such as the network, energy, and so on. In this paper, for simplicity, we focus on execution time only. That said, much of our work is independent of which resource is considered. We expect that our techniques could be adapted to verify asymptotic bounds on the use of other non-renewable resources, such as the network.

We work with a simple model of program execution, where certain operations, such as calling a function or entering a loop body, cost one unit of time, and every other operation costs nothing. Although this model is very remote from physical running time, it is independent of the compiler, operating system, and hardware [18,24] and still allows establishing asymptotic time complexity bounds, and therefore, detecting complexity bugs—situations where a program is asymptotically slower than it should be.

In prior work [11], the second and third authors present a method for verifying that a program satisfies a specification that includes an explicit bound on the program's worst-case, amortized time complexity. They use Separation Logic with Time Credits, a simple extension of Separation Logic [23] where the assertion \$1 represents a permission to perform one step of computation, and is consumed when exercised. The assertion \$n is a separating conjunction of n such time credits. Separation Logic with Time Credits is implemented in the second author's interactive verification framework, CFML [9,10], which is embedded in the Coq proof assistant.

Using CFML, the second and third authors verify the correctness and time complexity of an OCaml implementation of the Union-Find data structure [11]. However, their specifications involve *concrete* cost functions: for instance, the precondition of the function *find* indicates that calling *find* requires and consumes \$(2α(n) + 4), where n is the current number of elements in the data structure, and where α denotes an inverse of Ackermann's function. We would prefer the specification to give the *asymptotic* complexity bound O(α(n)), which means that, for *some* function f <sup>∈</sup> O(α(n)), calling *find* requires and consumes \$f(n). This is the purpose of this paper.

We argue that the use of asymptotic bounds, such as O(α(n)), is necessary for (verified or unverified) complexity analysis to be applicable at scale. At a superficial level, it reduces clutter in specifications and proofs: O(mn) is more compact and readable than 3mn+ 2n log n+ 5n+ 3m+ 2. At a deeper level, it is crucial for stating modular specifications, which hide the details of a particular implementation. Exposing the fact that *find* costs 2α(n) + 4 is undesirable: if a tiny modification of the Union-Find module changes this cost to 2α(n) + 5, then all direct and indirect clients of the Union-Find module must be updated, which is intolerable. Furthermore, sometimes, the constant factors are unknown anyway. Applying the Master Theorem [12] to a recurrence equation only yields an order of growth, not a concrete bound. Finally, for most practical purposes, no critical information is lost when concrete bounds such as 2α(n) + 4 are replaced with asymptotic bounds such as O(α(n)). Indeed, the number of computation steps that take place at the source level is related to physical time only up to a hardware- and compiler-dependent constant factor. The use of asymptotic complexity in the analysis of algorithms, initially advocated by Hopcroft and by Tarjan, has been widely successful and is nowadays standard practice.

One must be aware of several limitations of our approach. First, it is not a worst-case execution time (WCET) analysis: it does not yield bounds on actual physical execution time. Second, it is not fully automated. We place emphasis on expressiveness, as opposed to automation. Our vision is that verifying the functional correctness *and* time complexity of a program, at the same time, should not involve much more effort than verifying correctness alone. Third, we control only the growth of the cost as the parameters grow large. A loop that counts up from 0 to 2<sup>60</sup> has complexity O(1), even though it typically won't terminate in a lifetime. Although this is admittedly a potential problem, traditional program verification falls prey to analogous pitfalls: for instance, a program that attempts to allocate and initialize an array of size (say) 2<sup>48</sup> can be proved correct, even though, on contemporary desktop hardware, it will typically fail by lack of memory. We believe that there is value in our approach in spite of these limitations.

Reasoning and working with asymptotic complexity bounds is not as simple as one might hope. As demonstrated by several examples in Sect. 2, typical paper proofs using the O notation rely on informal reasoning principles which can easily be abused to prove a contradiction. Of course, using a proof assistant steers us clear of this danger, but implies that our proofs cannot be quite as simple and perhaps cannot have quite the same structure as their paper counterparts.

A key issue that we run against is the handling of existential quantifiers. According to what was said earlier, the specification of a sorting algorithm, say *mergesort*, should be, roughly: "there exists a cost function f <sup>∈</sup> O(λn.n log n) such that *mergesort* is content with \$f(n), where n is the length of the input list." Therefore, the very first step in a na¨ıve proof of *mergesort* must be to exhibit a witness for f, that is, a concrete cost function. An appropriate witness might be λn.2n log n, or λn.n log n + 3, who knows? This information is not available up front, at the very *beginning* of the proof; it becomes available only *during* the proof, as we examine the code of *mergesort*, step by step. It is not reasonable to expect the human user to guess such a witness. Instead, it seems desirable to *delay* the production of the witness and to *gradually* construct a cost expression as the proof progresses. In the case of a nonrecursive function, such as *insertionsort*, the cost expression, once fully synthesized, yields the desired witness. In the case of a recursive function, such as *mergesort*, the cost expression yields the body of a recurrence equation, whose solution is the desired witness.

We make the following contributions:


Our code can be found online in the form of two standalone Coq libraries and a self-contained archive [16].

### **2 Challenges in Reasoning with the** *O* **Notation**

When informally reasoning about the complexity of a function, or of a code block, it is customary to make assertions of the form "this code has asymptotic complexity O(1)", "that code has asymptotic complexity O(n)", and so on. Yet, these assertions are too informal: they do not have sufficiently precise meaning, and can be easily abused to produce flawed paper proofs.

A striking example appears in Fig. 2, which shows how one might "prove" that a recursive function has complexity O(1), whereas its actual cost is O(n). The flawed proof exploits the (valid) relation O(1) + O(1) = O(1), which means that a sequence of two constant-time code fragments is itself a constant-time code fragment. The flaw lies in the fact that the O notation hides an existential quantification, which is inadvertently swapped with the universal quantification over the parameter n. Indeed, the claim is that "there exists a constant c such that, for every n, waste(n) runs in at most c computation steps". However, the proposed proof by induction establishes a much weaker result, to wit: "for every n, there exists a constant c such that waste(n) runs in at most c steps". This result is certainly true, yet does not entail the claim.

An example of a different nature appears in Fig. 3. There, the auxiliary function g takes two integer arguments n and m and involves two nested loops, over the intervals [1, n] and [1, m]. Its asymptotic complexity is O(n <sup>+</sup> nm), which, *under the hypothesis that* m *is large enough*, can be simplified to O(nm). The reasoning, thus far, is correct. The flaw lies in our attempt to substitute 0 for m

**Incorrect claim:** The OCaml function waste has asymptotic complexity O(1).

let rec waste n = if n>0 then waste (n-1)

#### **Flawed proof:**

Let us prove by induction on n that waste(n) costs O(1).


**Fig. 2.** A flawed proof that waste(n) costs <sup>O</sup>(1), when its actual cost is <sup>O</sup>(n).

**Incorrect claim:** The OCaml function f has asymptotic complexity O(1).

```
let g (n, m) =
  for i=1 to n do
    for j=1 to m do () done
  done
let f n = g (n, 0)
```
### **Flawed proof:**

**–** g(n, m) involves nm inner loop iterations, thus costs O(nm).

**–** The cost of f(n) is the cost of g(n, 0), plus O(1). As the cost of g(n, m) is O(nm), we find, by substituting 0 for m, that the cost of g(n, 0) is O(0). Thus, f(n) is O(1).

**Fig. 3.** A flawed proof that <sup>f</sup>(n) costs <sup>O</sup>(1), when its actual cost is <sup>O</sup>(n).

**Incorrect claim:** The OCaml function h has asymptotic complexity O(nm<sup>2</sup>).

```
1 let h ( m , n ) =
2 for i = 0 to m−1 do
3 let p = ( if i = 0 then pow2 n else n∗i ) in
4 for j = 1 to p do ( ) done
5 done
```
#### **Flawed proof:**


$$\sum\_{i=0}^{m-1} O(ni) = O\left(n \cdot \sum\_{i=0}^{m-1} i\right) = O(nm^2).$$

**Fig. 4.** A flawed proof that <sup>h</sup>(m, n) costs <sup>O</sup>(nm<sup>2</sup>), when its actual cost is <sup>O</sup>(2*<sup>n</sup>* <sup>+</sup>nm<sup>2</sup>).

in the bound O(nm). Because this bound is valid only for sufficiently large m, it does not make sense to substitute a specific value for m. In other words, from the fact that "g(n, m) costs O(nm) when n and m are sufficiently large", one *cannot* deduce anything about the cost of g(n, 0). To repair this proof, one must take a step back and prove that g(n, m) has asymptotic complexity O(n <sup>+</sup> nm) *for sufficiently large* n *and for every* m*.* This fact *can* be instantiated with m = 0, allowing one to correctly conclude that g(n, 0) costs O(n). We come back to this example in Sect. 3.3.

One last example of tempting yet invalid reasoning appears in Fig. 4. We borrow it from Howell [19]. This flawed proof exploits the dubious idea that "the asymptotic cost of a loop is the sum of the asymptotic costs of its iterations". In more precise terms, the proof relies on the following implication, where f(m, n, i) represents the true cost of the i-th loop iteration and g(m, n, i) represents an asymptotic bound on f(m, n, i):

$$f(m,n,i) \in O(g(m,n,i)) \quad \Rightarrow \quad \sum\_{i=0}^{m-1} f(m,n,i) \in O\left(\sum\_{i=0}^{m-1} g(m,n,i)\right)$$

As pointed out by Howell, this implication is in fact invalid. Here, f(m, n, 0) is 2*<sup>n</sup>* and f(m, n, i) when i > 0 is ni, while g(m, n, i) is just ni. The left-hand side of the above implication holds, but the right-hand side does not, as 2*<sup>n</sup>* +*m*−1 *<sup>i</sup>*=1 ni is O(2*<sup>n</sup>* <sup>+</sup> nm<sup>2</sup>), not <sup>O</sup>(nm<sup>2</sup>). The Summation lemma presented later on in this paper (Lemma 8) rules out the problem by adding the requirement that f be a nondecreasing function of the loop index i. We discuss in depth later on (Sect. 4.5) why cost functions should and can be monotonic.

The examples that we have presented show that the informal reasoning style of paper proofs, where the O notation is used in a loose manner, is unsound. One cannot hope, in a formal setting, to faithfully mimic this reasoning style. In this paper, we do assign O specifications to functions, because we believe that this style is elegant, modular and scalable. However, during the analysis of a function body, we abandon the O notation. We first synthesize a cost expression for the function body, then check that this expression is indeed dominated by the asymptotic bound that appears in the specification.

### **3 Formalizing the** *O* **Notation**

### **3.1 Domination**

In many textbooks, the fact that f is bounded above by g asymptotically, up to constant factor, is written "f <sup>=</sup> O(g)" or "f <sup>∈</sup> O(g)". However, the former notation is quite inappropriate, as it is clear that "f <sup>=</sup> O(g)" cannot be literally understood as an equality. Indeed, if it truly were an equality, then, by symmetry and transitivity, <sup>f</sup><sup>1</sup> <sup>=</sup> <sup>O</sup>(g) and <sup>f</sup><sup>2</sup> <sup>=</sup> <sup>O</sup>(g) would imply <sup>f</sup><sup>1</sup> <sup>=</sup> <sup>f</sup><sup>2</sup>. The latter notation makes much better sense: O(g) is then understood as a set of functions. This approach has in fact been used in formalizations of the O notation [3]. Yet, in this paper, we prefer to think directly in terms of a *domination* preorder between functions. Thus, instead of "f <sup>∈</sup> O(g)", we write f g.

Although the O notation is often defined in the literature only in the special case of functions whose domain is N, Z or R, we must define domination in the general case of functions whose domain is an arbitrary type A. By later instantiating A with a product type, such as <sup>Z</sup>*<sup>k</sup>*, we get a definition of domination that covers the multivariate case. Thus, let us fix a type A, and let f and g inhabit the function type A <sup>→</sup> <sup>Z</sup>. 1

Fixing the type A, it turns out, is not quite enough. In addition, the type A must be equipped with a *filter* [6]. To see why that is the case, let us work towards the definition of domination. As is standard, we wish to build a notion of "growing large enough" into the definition of domination. That is, instead of requiring a relation of the form <sup>|</sup>f(x)| ≤ c <sup>|</sup>g(x)<sup>|</sup> to be "everywhere true", we require it to be "ultimately true", that is, "true when x is large enough".<sup>2</sup> Thus, f g should mean, roughly:

"up to a constant factor, ultimately, <sup>|</sup>f<sup>|</sup> is bounded above by <sup>|</sup>g|."

That is, somewhat more formally:

"for some c, for every sufficiently large x, <sup>|</sup>f(x)| ≤ c <sup>|</sup>g(x)|"

In mathematical notation, we would like to write: <sup>∃</sup>c. <sup>U</sup>x. <sup>|</sup>f(x)| ≤ c <sup>|</sup>g(x)|. For such a formula to make sense, we must define the meaning of the formula <sup>U</sup>x.P, where x inhabits the type A. This is the reason why the type A must be

<sup>1</sup> At this time, we require the codomain of f and g to be Z. Following Avigad and Donnelly [3], we could allow it to be an arbitrary nondegenerate ordered ring. We have not yet needed this generalization.

<sup>2</sup> When A is N, provided g(x) is never zero, requiring the inequality to be "everywhere true" is in fact the same as requiring it to be "ultimately true". Outside of this special case, however, requiring the inequality to hold everywhere is usually too strong.

equipped with a filter U, which intuitively should be thought of as a quantifier, whose meaning is "ultimately". Let us briefly defer the definition of a filter (Sect. 3.2) and sum up what has been explained so far:

**Definition 1 (Domination).** *Let* A *be a filtered type, that is, a type* A *equipped with a filter* U*A.*

*The relation <sup>A</sup> on* <sup>A</sup> <sup>→</sup> <sup>Z</sup> *is defined as follows:*

$$\begin{array}{rcl} f \preceq\_A g & \equiv & \exists c. \; \mathbb{U}\_A \; x. \; |f(x)| \le c \; |g(x)|. \end{array}$$

### **3.2 Filters**

Whereas <sup>∀</sup>x.P means that P holds of *every* x, and <sup>∃</sup>x.P means that P holds of *some* x, the formula <sup>U</sup>x.P should be taken to mean that P holds of every *sufficiently large* x, that is, P *ultimately* holds.

The formula <sup>U</sup>x.P is short for <sup>U</sup> (λx.P). If x ranges over some type A, then <sup>U</sup> must have type <sup>P</sup>(P(A)), where <sup>P</sup>(A) is short for A <sup>→</sup> Prop. To stress this better, although Bourbaki [6] states that a filter is "a set of subsets of A", it is crucial to note that <sup>P</sup>(P(A)) is the type of a quantifier in higher-order logic.

**Definition 2 (Filter).** *<sup>A</sup>* filter [6] *on a type* A *is an object* <sup>U</sup> *of type* <sup>P</sup>(P(A)) *that enjoys the following four properties, where* <sup>U</sup>x.P *is short for* <sup>U</sup> (λx.P)*:*


Properties (1)–(3) are intended to ensure that the intuitive reading of <sup>U</sup>x.P as: "for sufficiently large x, P holds" makes sense. Property (1) states that if <sup>P</sup><sup>1</sup> implies <sup>P</sup><sup>2</sup> and if <sup>P</sup><sup>1</sup> holds when <sup>x</sup> is large enough, then <sup>P</sup><sup>2</sup>, too, should hold when x is large enough. Properties (2a) and (2b), together, state that if each of <sup>P</sup><sup>1</sup>,...,P*<sup>k</sup>* independently holds when <sup>x</sup> is large enough, then <sup>P</sup><sup>1</sup>,...,P*<sup>k</sup>* should simultaneously hold when x is large enough. Properties (1) and (2b) together imply <sup>∀</sup>x.P <sup>⇒</sup> <sup>U</sup>x.P. Property (3) states that if P holds when x is large enough, then P should hold of some x. In classical logic, it would be equivalent to <sup>¬</sup>(Ux.False).

In the following, we let the metavariable A stand for a *filtered type*, that is, a pair of a carrier type and a filter on this type. By abuse of notation, we also write A for the carrier type. (In Coq, this is permitted by an implicit projection.) We write U*<sup>A</sup>* for the filter.

### **3.3 Examples of Filters**

When <sup>U</sup> is a *universal filter*, <sup>U</sup>x.Q(x) is (by definition) equivalent to <sup>∀</sup>x.Q(x). Thus, a predicate Q is "ultimately true" if and only if it is "everywhere true". In other words, the universal quantifier is a filter.

**Definition 3 (Universal filter).** *Let* T *be a nonempty type. Then* λQ.∀x.Q(x) *is a filter on* T*.*

When <sup>U</sup> is the *order filter* associated with the ordering <sup>≤</sup>, the formula <sup>U</sup>x.Q(x) means that, when x becomes sufficiently large with respect to <sup>≤</sup>, the property Q(x) becomes true.

**Definition 4 (Order filter).** *Let* (T, <sup>≤</sup>) *be a nonempty ordered type, such that every two elements have an upper bound. Then* λQ.∃x<sup>0</sup>.∀<sup>x</sup> <sup>≥</sup> <sup>x</sup><sup>0</sup>. Q(x) *is a filter on* T*.*

The order filter associated with the ordered type (Z, <sup>≤</sup>) is the most natural filter on the type Z. Equipping the type Z with this filter yields a filtered type, which, by abuse of notation, we also write <sup>Z</sup>. Thus, the formula <sup>U</sup><sup>Z</sup> x.Q(x) means that Q(x) becomes true "as x tends towards infinity".

By instantiating Definition 1 with the filtered type Z, we recover the classic definition of domination between functions of Z to Z:

$$f \preceq\_{\mathbb{Z}} g \iff \exists c. \exists n\_0. \forall n \ge n\_0. \left| f(n) \right| \le c \left| g(n) \right|.$$

We now turn to the definition of a filter on a product type <sup>A</sup><sup>1</sup>×A<sup>2</sup>, where <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> are filtered types. Such a filter plays a key role in defining domination between functions of several variables. The following *product filter* is the most natural construction, although there are others:

**Definition 5 (Product filter).** *Let* <sup>A</sup><sup>1</sup> *and* <sup>A</sup><sup>2</sup> *be filtered types. Then*

$$
\lambda Q.\exists Q\_1, Q\_2.\begin{cases}
\Udash\mathcal{U}\_{A\_1}x\_1.Q\_1\\ \wedge \mathcal{U}\_{A\_2}x\_2.Q\_2\\ \wedge \forall x\_1, x\_2.\ Q\_1(x\_1)\wedge Q\_2(x\_2)\Rightarrow Q(x\_1,x\_2)
\end{cases}
$$

*is a filter on the product type* <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup>*.*

To understand this definition, it is useful to consider the special case where <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> are both <sup>Z</sup>. Then, for <sup>i</sup> ∈ {1, <sup>2</sup>}, the formula <sup>U</sup>*<sup>A</sup><sup>i</sup>* <sup>x</sup>*<sup>i</sup>*. Q*<sup>i</sup>* means that the predicate <sup>Q</sup>*<sup>i</sup>* contains an infinite interval of the form [a*<sup>i</sup>*,∞). Thus, the formula <sup>∀</sup>x<sup>1</sup>, x<sup>2</sup>. Q<sup>1</sup>(x<sup>1</sup>) <sup>∧</sup> <sup>Q</sup><sup>2</sup>(x<sup>2</sup>) <sup>⇒</sup> <sup>Q</sup>(x<sup>1</sup>, x<sup>2</sup>) requires the predicate <sup>Q</sup> to contain the infinite rectangle [a<sup>1</sup>,∞) <sup>×</sup> [a<sup>2</sup>,∞). Thus, a predicate Q on <sup>Z</sup><sup>2</sup> is "ultimately true" w.r.t. to the product filter if and only if it is "true on some infinite rectangle". In Bourbaki's terminology [6, Chap. 1, Sect. 6.7], the infinite rectangles form a *basis* of the product filter.

We view the product filter as the default filter on the product type <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup>. Whenever we refer to <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> in a setting where a filtered type is expected, the product filter is intended.

We stress that there are several filters on Z, including the universal filter and the order filter, and therefore several filters on Z*<sup>k</sup>*. Therefore, it does not make sense to use the O notation without specifying which filter one considers. Consider again the function g(n, m) in Fig. <sup>3</sup> (Sect. 2). One can prove that g(n, m) has complexity O(nm <sup>+</sup> n) with respect to the standard filter on <sup>Z</sup><sup>2</sup>. With respect to *this filter*, this complexity bound is equivalent to O(mn), as the functions λ(m, n).mn <sup>+</sup> n and λ(m, n).mn dominate each other. Unfortunately, this *does not allow* deducing anything about the complexity of g(n, 0), since the bound O(mn) holds only when n and m grow large. An alternate approach is to prove that g(n, m) has complexity O(nm <sup>+</sup> n) with respect to a stronger filter, namely the product of the standard filter on Z and the universal filter on Z. With respect to *that filter*, the functions λ(m, n).mn <sup>+</sup> n and λ(m, n).mn are *not* equivalent. This bound *does allow* instantiating m with 0 and deducing that g(n, 0) has complexity O(n).

### **3.4 Properties of Domination**

Many properties of the domination relation can be established with respect to an arbitrary filtered type A. Here are two example lemmas; there are many more. As before, f and g range over A <sup>→</sup> <sup>Z</sup>. The operators f <sup>+</sup> <sup>g</sup>, max(f,g) and f.g denote pointwise sum, maximum, and product, respectively.

**Lemma 6 (Sum and Max Are Alike).** *Assume* f *and* g *are ultimately nonnegative, that is,* <sup>U</sup>*<sup>A</sup>* x. f(x) <sup>≥</sup> <sup>0</sup> *and* <sup>U</sup>*<sup>A</sup>* x. g(x) <sup>≥</sup> <sup>0</sup> *hold. Then, we have* max(f,g) *<sup>A</sup>* <sup>f</sup> <sup>+</sup> <sup>g</sup> *and* <sup>f</sup> <sup>+</sup> <sup>g</sup> *<sup>A</sup>* max(f,g)*.*

**Lemma 7 (Multiplication).** <sup>f</sup><sup>1</sup> *<sup>A</sup>* <sup>g</sup><sup>1</sup> *and* <sup>f</sup><sup>2</sup> *<sup>A</sup>* <sup>g</sup><sup>2</sup> *imply* <sup>f</sup>1.f<sup>2</sup> *<sup>A</sup>* <sup>g</sup>1.g2*.*

Lemma 7 corresponds to Howell's Property 5 [19]. Whereas Howell states this property on <sup>N</sup>*<sup>k</sup>*, our lemma is polymorphic in the type <sup>A</sup>. As noted by Howell, this lemma is useful when the cost of a loop body is independent of the loop index. In the case where the cost of the i-th iteration may depend on the loop index i, the following, more complex lemma is typically used instead:

**Lemma 8 (Summation).** *Let* f,g *range over* <sup>A</sup> <sup>→</sup> <sup>Z</sup> <sup>→</sup> <sup>Z</sup>*. Let* <sup>i</sup><sup>0</sup> <sup>∈</sup> <sup>Z</sup>*. Assume the following three properties:*

*1.* <sup>U</sup>*<sup>A</sup>* a. <sup>∀</sup><sup>i</sup> <sup>≥</sup> <sup>i</sup><sup>0</sup>. f(a)(i) <sup>≥</sup> <sup>0</sup>*.*

*2.* <sup>U</sup>*<sup>A</sup>* a. <sup>∀</sup><sup>i</sup> <sup>≥</sup> <sup>i</sup><sup>0</sup>. g(a)(i) <sup>≥</sup> <sup>0</sup>*.*

*3. for every* <sup>a</sup>*, the function* λi.f(a)(i) *is nondecreasing on the interval* [i<sup>0</sup>,∞)*.*

*Then,*

$$
\lambda(a, i).f(a)(i) \quad \preceq\_{A \times \mathbb{Z}} \lambda(a, i).g(a)(i).
$$

*implies*

$$\lambda(a,n).\ \sum\_{i=i\_0}^n f(a)(i)\ \preceq\_{A\times\mathbb{Z}}\ \lambda(a,n).\ \sum\_{i=i\_0}^n g(a)(i).$$

Lemma <sup>8</sup> uses the product filter on A×<sup>Z</sup> in its hypothesis and conclusion. It corresponds to Howell's property 2 [19]. The variable i represents the loop index, while the variable a collectively represents all other variables in scope, so the type A is usually instantiated with a tuple type (an example appears in Sect. 6).

An important property is the fact that function composition is compatible, in a certain sense, with domination. This allows transforming the parameters under which an asymptotic analysis is carried out (examples appear in Sect. 6). Due to space limitations, we refer the reader to the Coq library for details [16].

### **3.5 Tactics**

Our formalization of filters and domination forms a stand-alone Coq library [16]. In addition to many lemmas about these notions, the library proposes automated tactics that can prove nonnegativeness, monotonicity, and domination goals. These tactics currently support functions built out of variables, constants, sums and maxima, products, powers, logarithms. Extending their coverage is ongoing work. This library is not tied to our application to the complexity analysis of programs. It could have other applications in mathematics.

### **4 Specifications with Asymptotic Complexity Claims**

In this section, we first present our existing approach to verified time complexity analysis. This approach, proposed by the second and third authors [11], does not use the O notation: instead, it involves explicit cost functions. We then discuss how to extend this approach with support for asymptotic complexity claims. We find that, even once domination (Sect. 3) is well-understood, there remain nontrivial questions as to the style in which program specifications should be written. We propose one style which works well on small examples and which we believe should scale well to larger ones.

### **4.1 CFML with Time Credits for Cost Analysis**

CFML [9,10] is a system that supports the interactive verification of OCaml programs, using higher-order Separation Logic, inside Coq. It is composed of a trusted standalone tool and a Coq library. The CFML tool transforms a piece of OCaml code into a *characteristic formula*, a Coq formula that describes the semantics of the code. The characteristic formula is then exploited, inside Coq, to state that the code satisfies a certain specification (a Separation Logic triple) and to interactively prove this statement. The CFML library provides a set of Coq tactics that implement the reasoning rules of Separation Logic.

In prior work [11], the second and third authors extend CFML with time credits [2,22] and use it to simultaneously verify the functional correctness and the (amortized) time complexity of OCaml code. To illustrate the style in which they write specifications, consider a function that computes the length of a list:

```
let rec length l =
    match l with
    | [] -> 0
    | _ :: l -> 1 + length l
```
About this function, one can prove the following statement:

```
∀(A : Type)(l : list A). { $(|l| + 1) } (length l) {λy. [ y = |l| ]}
```
This is a Separation Logic triple {H} (t) {Q}. The postcondition λy. [ y <sup>=</sup> <sup>|</sup>l<sup>|</sup> ] asserts that the call length l returns the length of the list l. <sup>3</sup> The precondition \$(|l<sup>|</sup> + 1) asserts that this call requires <sup>|</sup>l<sup>|</sup> + 1 credits. This triple is proved in a variant of Separation Logic where every function call and every loop iteration consumes one credit. Thus, the above specification guarantees that the execution of length l involves no more than <sup>|</sup>l<sup>|</sup> + 1 function calls or loop iterations. Our previous paper [11, Definition 2] gives a precise definition of the meaning of triples.

As argued in prior work [11, Sect. 2.7], bounding the number of function calls and loop iterations is equivalent, up to a constant factor, to bounding the number of reduction steps of the program. Assuming that the OCaml compiler is complexity-preserving, this is equivalent, up to a constant factor, to bounding the number of instructions executed by the compiled code. Finally, assuming that the machine executes one instruction in bounded time, this is equivalent, up to a constant factor, to bounding the execution time of the compiled code. Thus, the above specification guarantees that length runs in linear time.

Instead of understanding Separation Logic with Time Credits as a variant of Separation Logic, one can equivalently view it as standard Separation Logic, applied to an instrumented program, where a pay() instruction has been inserted at the beginning of every function body and loop body. The proof of the program is carried out under the axiom {\$1} (pay()) {λ .}, which imposes the consumption of one time credit at every pay() instruction. This instruction has no runtime effect: it is just a way of marking where credits must be consumed.

For example, the OCaml function length is instrumented as follows:

```
let rec length l =
    pay();
    match l with [] -> 0 | _ :: l -> 1 + length l
```
Executing "length l" involves executing pay() exactly <sup>|</sup>l<sup>|</sup> + 1 times. For this reason, a valid specification of this instrumented code in ordinary Separation Logic must require at least <sup>|</sup>l<sup>|</sup> + 1 credits in its precondition.

### **4.2 A Modularity Challenge**

The above specification of length guarantees that length runs in linear time, but does not allow predicting how much real time is consumed by a call to length. Thus, this specification is already rather abstract. Yet, it is still too precise. Indeed, we believe that it would not be wise for a list library to publish a specification of length whose precondition requires exactly <sup>|</sup>l<sup>|</sup> + 1 credits. Indeed, there are implementations of length that do not meet this specification. For example, the tail-recursive implementation found in the OCaml standard library, which in practice is more efficient than the na¨ıve implementation shown

<sup>3</sup> The square brackets denote a pure Separation Logic assertion. <sup>|</sup>l<sup>|</sup> denotes the length of the Coq list l. CFML transparently reflects OCaml integers as Coq relative integers and OCaml lists as Coq lists.

above, involves exactly <sup>|</sup>l<sup>|</sup> + 2 function calls, therefore requires <sup>|</sup>l<sup>|</sup> + 2 credits. By advertising a specification where <sup>|</sup>l<sup>|</sup> + 1 credits suffice, one makes too strong a guarantee, and rules out the more efficient implementation.

After initially publishing a specification that requires \$(|l<sup>|</sup> + 1), one could of course still switch to the more efficient implementation and update the published specification so as to require \$(|l<sup>|</sup> + 2) instead of \$(|l<sup>|</sup> + 1). However, that would in turn require updating the specification and proof of every (direct and indirect) client of the list library, which is intolerable.

To leave some slack, one should publish a more abstract specification. For example, one could advertise that the cost of length l is an affine function of the length of the list l, that is, the cost is a · |l<sup>|</sup> <sup>+</sup> b, for some constants a and b:

$$\exists (a, b: \mathbb{Z}). \ \forall (A: \mathsf{Type}) (l: \mathsf{list} \, A). \ \{\\$(a \cdot |l| + b)\} \, (\mathsf{Length} \, l) \, \{\lambda y. \, [y = |l|] \}$$

This is a better specification, in the sense that it is more modular. The na¨ıve implementation of length shown earlier and the efficient implementation in OCaml's standard library both satisfy this specification, so one is free to choose one or the other, without any impact on the clients of the list library. In fact, any reasonable implementation of length should have linear time complexity and therefore should satisfy this specification.

That said, the style in which the above specification is written is arguably slightly too low-level. Instead of directly expressing the idea that the cost of length l is O(|l|), we have written this cost under the form a · |l<sup>|</sup> <sup>+</sup> b. It is preferable to state at a more abstract level that *cost* is dominated by λn.n: such a style is more readable and scales to situations where multiple parameters and nonstandard filters are involved. Thus, we propose the following statement:

$$\exists \exists \text{cost} : \mathbb{Z} \to \mathbb{Z}. \left\{ \begin{aligned} \text{cost} &\preceq\_{\mathbb{Z}} \lambda n. n \\ \forall (A : \mathsf{Type}) (l : \mathsf{list} \, A). & \left\{ \text{cost} (|l|) \right\} (\mathsf{Length} \, l) \left\{ \lambda y. \left[ y = |l| \right] \right\} \end{aligned} \right.$$

Thereafter, we refer to the function *cost* as the *concrete cost* of length, as opposed to the *asymptotic bound*, represented here by the function λn. n. This specification asserts that there exists a concrete cost function *cost*, which is dominated by λn. n, such that *cost*(|l|) credits suffice to justify the execution of length l. Thus, *cost*(|l|) is an upper bound on the actual number of pay() instructions that are executed at runtime.

The above specification informally means that length l has time complexity O(n) where the parameter n represents <sup>|</sup>l|, that is, the length of the list l. The fact that n represents <sup>|</sup>l<sup>|</sup> is expressed by applying *cost* to <sup>|</sup>l<sup>|</sup> in the precondition. The fact that this analysis is valid when n grows large enough is expressed by using the standard filter on <sup>Z</sup> in the assertion *cost* <sup>Z</sup> λn. n.

In general, it is up to the user to choose what the parameters of the cost analysis should be, what these parameters represent, and which filter on these parameters should be used. The example of the Bellman-Ford algorithm (Sect. 6) illustrates this.

```
Record specO (A : filterType) (le : A →A →Prop)
             (bound : A →Z) (P : (A →Z) →Prop)
 := { cost : A →Z;
      cost_spec : P cost;
      cost_dominated : dominated A cost bound;
      cost_nonneg : ∀x, 0 ≤ cost x;
      cost_monotonic : monotonic le Z.le cost; }.
```
**Fig. 5.** Definition of specO.

### **4.3 A Record for Specifications**

The specifications presented in the previous section share a common structure. We define a record type that captures this common structure, so as to make specifications more concise and more recognizable, and so as to help users adhere to this specification pattern.

This type, specO, is defined in Fig. 5. The first three fields in this record type correspond to what has been explained so far. The first field asserts the existence of a function cost of A to <sup>Z</sup>, where A is a user-specified filtered type. The second field asserts that a certain property P cost is satisfied; it is typically a Separation Logic triple whose precondition refers to cost. The third field asserts that cost is dominated by the user-specified function bound. The need for the last two fields is explained further on (Sects. 4.4 and 4.5).

Using this definition, our proposed specification of length (Sect. 4.2) is stated in concrete Coq syntax as follows:

```
Theorem length_spec:
    specO Z_filterType Z.le (fun n ⇒ n) (fun cost ⇒
      ∀A (l:list A), triple (length l)
        PRE ($ (cost |l|))
        POST (fun y ⇒ [ y = |l| ]))
```
The key elements of this specification are Z\_filterType, which is <sup>Z</sup>, equipped with its standard filter; the asymptotic bound fun n <sup>⇒</sup> n, which means that the time complexity of length is O(n); and the Separation Logic triple, which describes the behavior of length, and refers to the concrete cost function cost.

One key technical point is that specO is a strong existential, whose witness can be referred to via to the first projection, cost. For instance, the concrete cost function associated with length can be referred to as cost length\_spec. Thus, at a call site of the form length xs, the number of required credits is cost length\_spec <sup>|</sup>xs|.

In the next subsections, we explain why, in the definition of specO, we require the concrete cost function to be nonnegative and monotonic. These are design decisions; although these properties may not be strictly necessary, we find that enforcing them greatly simplifies things in practice.

### **4.4 Why Cost Functions Must Be Nonnegative**

There are several common occasions where one is faced with the obligation of proving that a cost expression is nonnegative. These proof obligations arise from several sources.

One source is the Separation Logic axiom for splitting credits, whose statement is \$(m <sup>+</sup> n)=\$m \$n, subject to the side conditions m <sup>≥</sup> 0 and n <sup>≥</sup> 0. Without these side conditions, out of \$0, one would be able to create \$1 \$(−1). Because our logic is affine, one could then discard \$(−1), keeping just \$1. In short, an unrestricted splitting axiom would allow creating credits out of thin air.<sup>4</sup> Another source of proof obligations is the Summation lemma (Lemma 8), which requires the functions at hand to be (ultimately) nonnegative.

Now, suppose one is faced with the obligation of proving that the expression cost length\_spec <sup>|</sup>xs<sup>|</sup> is nonnegative. Because length\_spec is an existential package (a specO record), this is impossible, unless this information has been recorded up front within the record. This is the reason why the field cost\_nonneg in Fig. 5 is needed.

For simplicity, we require cost functions to be nonnegative everywhere, as opposed to within a certain domain. This requirement is stronger than necessary, but simplifies things, and can easily be met in practice by wrapping cost functions within "max(0, <sup>−</sup>)". Our Coq tactics automatically insert "max(0, <sup>−</sup>)" wrappers where necessary, making this issue mostly transparent to the user. In the following, for brevity, we write c<sup>+</sup> for max(0, c), where <sup>c</sup> <sup>∈</sup> <sup>Z</sup>.

### **4.5 Why Cost Functions Must Be Monotonic**

One key reason why cost functions should be monotonic has to do with the "avoidance problem". When the cost of a code fragment depends on a local variable x, can this cost be reformulated (and possibly approximated) in such a way that the dependency is removed? Indeed, a cost expression that makes sense outside the scope of x is ultimately required.

The problematic cost expression is typically of the form E[|x|], where <sup>|</sup>x<sup>|</sup> represents some notion of the "size" of the data structure denoted by x, and E is an arithmetic context, that is, an arithmetic expression with a hole. Furthermore, an upper bound on <sup>|</sup>x<sup>|</sup> is typically available. This upper bound can be exploited if the context E is monotonic, i.e., if x <sup>≤</sup> y implies E[x] <sup>≤</sup> E[y]. Because the hole in E can appear as an actual argument to an abstract cost function, we must record the fact that this cost function is monotonic.

To illustrate the problem, consider the following OCaml function, which counts the positive elements in a list of integers. It does so, in linear time, by first building a sublist of the positive elements, then computing the length of this sublist.

<sup>4</sup> Another approach would be to define \$<sup>n</sup> only for <sup>n</sup> <sup>∈</sup> <sup>N</sup>, in which case an unrestricted axiom would be sound. However, as we use Z everywhere, that would be inconvenient. A more promising idea is to view \$n as linear (as opposed to affine) when n is negative. Then, \$(−1) cannot be discarded, so unrestricted splitting is sound.

### let count\_pos l = let l' = List .filter ( fun x -> x > 0) l in List .length l'

How would one go about proving that this code actually has linear time complexity? On paper, one would informally argue that the cost of the sequence pay(); filter; length is O(1) + O(|l|) + O(|l <sup>|</sup>), then exploit the inequality <sup>|</sup>l |≤|l|, which follows from the semantics of filter, and deduce that the cost is O(|l|).

In a formal setting, though, the problem is not so simple. Assume that we have two specification lemmas length\_spec and filter\_spec for List.length and List.filter, which describe the behavior of these OCaml functions and guarantee that they have linear-time complexity. For brevity, let us write just g and f for the functions cost length\_spec and cost filter\_spec. Also, at the mathematical level, let us write l<sup>↓</sup> for the sublist of the positive elements of the list l. It is easy enough to check that the cost of the expression "pay(); let l' = ... in List.length l'" is 1 + f(|l|) + g(|l |). The problem, now, is to *find an upper bound* for this cost *that does not depend on* l , a local variable, and to verify that this upper bound, *expressed as a function of* <sup>|</sup>l|, is dominated by λn. n. Indeed, this is required in order to establish a specO statement about count\_pos.

What might this upper bound be? That is, which functions *cost* of Z to Z are such that (A) 1 +f(|l|) +g(|l <sup>|</sup>) <sup>≤</sup> *cost*(|l|) can be proved (in the scope of the local variable l ) and (B) *cost* <sup>Z</sup> λn. n holds? Three potential answers come to mind:


$$cost = \lambda n. \max\_{0 \le n' \le n} 1 + f(n) + g(n')$$

Furthermore, for this definition of *cost*, the domination assertion (B) holds as well. The proof relies on the fact the functions g and ˆg, where ˆg is λn. max<sup>0</sup>≤*n*-<sup>≤</sup>*<sup>n</sup>* <sup>g</sup>(n ) [19], dominate each other. Although this approach seems viable, and does not require the function g to be monotonic, it is a bit more complicated than we would like.

3. Let us now assume that the function g is monotonic, that is, nondecreasing. As before, within the scope of l , the inequality <sup>|</sup>l |≤|l<sup>|</sup> is available. Thus, the cost expression 1 + f(|l|) + g(|l <sup>|</sup>) is bounded by 1 + f(|l|) + g(|l|). Therefore, inequalities (A) and (B) are satisfied, provided we take:

$$cost = \lambda n. \, 1 + f(n) + g(n)$$

We believe that approach 3 is the simplest and most intuitive, because it allows us to easily eliminate l , without giving rise to a complicated cost function, and without the need for a running maximum.

However, this approach requires that the cost function g, which is short for cost length\_spec, be monotonic. This explains why we build a monotonicity condition in the definition of specO (Fig. 5, last line). Another motivation for doing so is the fact that some lemmas (such as Lemma 8, which allows reasoning about the asymptotic cost of an inner loop) also have monotonicity hypotheses.

The reader may be worried that, in practice, there might exist concrete cost functions that are not monotonic. This may be the case, in particular, of a cost function f that is obtained as the solution of a recurrence equation. Fortunately, in the common case of functions of <sup>Z</sup> to <sup>Z</sup>, the "running maximum" function <sup>ˆ</sup>f can always be used in place of f: indeed, it is monotonic and has the same asymptotic behavior as f. Thus, we see that both approaches <sup>2</sup> and <sup>3</sup> above involve running maxima in some places, but their use seems less frequent with approach 3.

### **5 Interactive Proofs of Asymptotic Complexity Claims**

To prove a specification lemma, such as length\_spec (Sect. 4.3) or loop\_spec (Sect. 4.4), one must construct a specO record. By definition of specO (Fig. 5), this means that one must exhibit a concrete cost function *cost* and prove a number of properties of this function, including the fact that, when supplied with \$(*cost* ...), the code runs correctly (cost\_spec) and the fact that *cost* is dominated by the desired asymptotic bound (cost\_dominated).

Thus, the very first step in a na¨ıve proof attempt would be to *guess* an appropriate cost function for the code at hand. However, such an approach would be painful, error-prone, and brittle. It seems much preferable, if possible, to enlist the machine's help in *synthesizing* a cost function *at the same time as we step through the code*—which we have to do anyway, as we must build a Separation Logic proof of the correctness of this code.

To illustrate the problem, consider the recursive function p, whose integer argument n is expected to satisfy n <sup>≥</sup> 0. For the sake of this example, p calls an auxiliary function g, which we assume runs in constant time.

```
let rec pn=
    if n <= 1 then () else begin g(); p(n-1) end
```
Suppose we wish to establish that p runs in linear time. As argued at the beginning of the paper (Sect. 2, Fig. 2), it does not make sense to attempt a proof by induction on n that "p n runs in time O(n)". Instead, in a formal framework, we must exhibit a concrete cost function *cost* such that *cost*(n) credits justify the call p n and *cost* grows linearly, that is, *cost* <sup>Z</sup> λn. n.

Let us assume that a specification lemma g\_spec for the function g has been established already, so the number of credits required by a call to g is cost g\_spec (). In the following, we write G as a shorthand for this constant.

Because this example is very simple, it is reasonably easy to manually come up with an appropriate cost function for p. One valid guess is λn. 1+Σ*<sup>n</sup> <sup>i</sup>*=2(1+G). Another valid guess, obtained via a simplification step, is λn. 1+(1+G)(n−1)<sup>+</sup>. Another witness, obtained via an approximation step, is λn. 1 + (1 + G)n<sup>+</sup>. As the reader can see, there is in fact a spectrum of valid witnesses, ranging from verbose, low-level to compact, high-level mathematical expressions. Also, it should be evident that, as the code grows larger, it can become very difficult to guess a valid concrete cost function.

This gives rise to two questions. Among the valid cost functions, which one is preferable? Which ones can be systematically constructed, without guessing?

Among the valid cost functions, there is a tradeoff. At one extreme, a low-level cost function has exactly the same syntactic structure as the code, so it is easy to prove that it is an upper bound for the actual cost of the code, but a lot of work may be involved in proving that it is dominated by the desired asymptotic bound. At the other extreme, a high-level cost function can be essentially identical to the desired asymptotic bound, up to explicit multiplicative and additive constants, so the desired domination assertion is trivial, but a lot of accounting work may be involved in proving that this function represents enough credits to execute the code. Thus, by choosing a cost function, we shift some of the burden of the proof from one subgoal to another. From this point of view, no cost function seems inherently preferable to another.

From the point of view of systematic construction, however, the answer is more clear-cut. It seems fairly clear that it is possible to systematically build a cost function whose syntactic structure is the same as the syntactic structure of the code. This idea goes at least as far back as Wegbreit's work [26]. Coming up with a compact, high-level expression of the cost, on the other hand, seems to require human insight.

To provide as much machine assistance as possible, our system mechanically synthesizes a low-level cost expression for a piece of OCaml code. This is done transparently, at the same time as the user constructs a proof of the code in Separation Logic. Furthermore, we take advantage of the fact that we are using an interactive proof assistant: we allow the user to guide the synthesis process. For instance, the user controls how a local variable should be eliminated, how the cost of a conditional construct should be approximated (i.e., by a conditional or by a maximum), and how recurrence equations should be solved. In the following, we present this semi-interactive synthesis process. We first consider straight-line (nonrecursive) code (Sect. 5.1), then recursive functions (Sect. 5.2).

### **5.1 Synthesizing Cost Expressions for Straight-Line Code**

The CFML library provides the user with interactive tactics that implement the reasoning rules of Separation Logic. We set things up in such a way that, as these rules are applied, a cost expression is automatically synthesized.


**Fig. 6.** The reasoning rules of Separation Logic, specialized for cost synthesis.

To this end, we use specialized variants of the reasoning rules, whose premises and conclusions take the form {\$ nH} (e) {Q}. Furthermore, to simplify the nonnegativeness side conditions that must be proved while reasoning, we make all cost expressions obviously nonnegative by wrapping them in max(0, <sup>−</sup>). Recall that c<sup>+</sup> stands for max(0, c), where <sup>c</sup> <sup>∈</sup> <sup>Z</sup>. Our reasoning rules work with triples of the form {\$ c<sup>+</sup> H} (e) {Q}. They are shown in Fig. 6.

Because we wish to *synthesize* a cost expression, our Coq tactics maintain the following invariant: whenever the goal is {\$ c<sup>+</sup> H} (e) {Q}, the cost <sup>c</sup> is *uninstantiated*, that is, it is represented in Coq by a metavariable, a placeholder. This metavariable is instantiated when the goal is proved by applying one of the reasoning rules. Such an application produces new subgoals, whose preconditions contain new metavariables. As this process is repeated, a cost expression is incrementally constructed.

The rule WeakenCost is a special case of the consequence rule of Separation Logic. It is typically used once at the root of the proof: even though the initial goal {\$ <sup>c</sup><sup>1</sup> H} (e) {Q} may not satisfy our invariant, because it lacks a <sup>−</sup><sup>+</sup> wrapper and because <sup>c</sup><sup>1</sup> is not necessarily a metavariable, WeakenCost gives rise to a subgoal {\$ c<sup>+</sup> <sup>2</sup> H} (e) {Q} that satisfies it. Indeed, when this rule is applied, a fresh metavariable <sup>c</sup><sup>2</sup> is generated. WeakenCost can also be explicitly applied by the user when desired. It is typically used just before leaving the scope of a local variable x to approximate a cost expression c<sup>+</sup> <sup>2</sup> that depends on <sup>x</sup> with an expression <sup>c</sup><sup>1</sup> that does not refer to <sup>x</sup>.

The Seq rule is a special case of the Let rule. It states that the cost of a sequence is the sum of the costs of its subexpressions. When this rule is applied to a goal of the form {\$ c<sup>+</sup> H} (e) {Q}, where <sup>c</sup> is a metavariable, two new metavariables <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>2</sup> are introduced, and <sup>c</sup> is instantiated with <sup>c</sup><sup>+</sup> <sup>1</sup> <sup>+</sup> <sup>c</sup><sup>+</sup> 2 .

The Let rule is similar to Seq, but involves an additional subtlety: the cost <sup>c</sup><sup>2</sup> must not refer to the local variable x. Naturally, Coq enforces this condition: any attempt to instantiate the metavariable <sup>c</sup><sup>2</sup> with an expression where <sup>x</sup> occurs fails. In such a situation, it is up to the user to use WeakenCost so as to avoid this dependency. The example of count\_pos (Sect. 4.5) illustrates this issue.

The Val rule handles values, which in our model have zero cost. The symbol denotes entailment between Separation Logic assertions.

The If rule states that the cost of an OCaml conditional expression is a mathematical conditional expression. Although this may seem obvious, one subtlety lurks here. Using WeakenCost, the cost expression *if* <sup>b</sup> *then* <sup>c</sup><sup>1</sup> *else* <sup>c</sup><sup>2</sup> can be approximated by max(c1, c<sup>2</sup>). Such an approximation can be beneficial, as it leads to a simpler cost expression, or harmful, as it causes a loss of information. In particular, when carried out in the body of a recursive function, it can lead to an unsatisfiable recurrence equation. We let the user decide whether this approximation should be performed.

The Pay rule handles the pay() instruction, which is inserted by the CFML tool at the beginning of every function and loop body (Sect. 4.1). This instruction costs one credit.

The For rule states that the cost of a for loop is the sum, over all values of the index i, of the cost of the i-th iteration of the body. In practice, it is typically used in conjunction with WeakenCost, which allows the user to simplify and approximate the iterated sum <sup>Σ</sup>*<sup>a</sup>*≤*i<b* <sup>c</sup>(i)<sup>+</sup>. In particular, if the synthesized cost c(i) happens to not depend on i, or can be approximated so as to not depend on i, then this iterated sum can be expressed under the form c(b <sup>−</sup> a)<sup>+</sup>. A variant of the For rule, not shown, covers this common case. There is in principle no need for a primitive treatment of loops, as loops can be encoded in terms of higher-order recursive functions, and our program logic can express the specifications of these combinators. Nevertheless, in practice, primitive support for loops is convenient.

This concludes our exposition of the reasoning rules of Fig. 6. Coming back to the example of the OCaml function p (Sect. 5), under the assumption that the cost of the recursive call p(n-1) is f(n−1), we are able, by repeated application of the reasoning rules, to automatically find that the cost of the OCaml expression:

#### if n <= 1 then () else begin g(); p(n-1) end

is: 1 + *if* n <sup>≤</sup> <sup>1</sup> *then* <sup>0</sup> *else* (G <sup>+</sup> f(n <sup>−</sup> 1)). The initial 1 accounts for the implicit pay(). This may seem obvious, and it is. The point is that this cost expression is automatically constructed: its synthesis adds no overhead to an interactive proof of functional correctness of the function p.

### **5.2 Synthesizing and Solving Recurrence Equations**

There now remains to explain how to deal with recursive functions. Suppose S(f) is the Separation Logic triple that we wish to establish, where f stands for an as-yet-unknown cost function. Following common informal practice, we would like to do this in two steps. First, from the code, derive a "recurrence equation" E(f), which in fact is usually not an equation, but a constraint (or a conjunction of constraints) bearing on f. Second, prove that this recurrence equation admits a solution that is dominated by the desired asymptotic cost function g. This approach can be formally viewed as an application of the following tautology:

$$\forall E. \ (\forall f. E(f) \to S(f)) \to (\exists f. E(f) \land f \preceq g) \to (\exists f. S(f) \land f \preceq g)$$

The conclusion S(f)∧f g states that the code is correct and has asymptotic cost g. In Coq, applying this tautology gives rise to a new metavariable E, as the recurrence equation is initially unknown, and two subgoals.

During the proof of the first subgoal, <sup>∀</sup>f.E(f) <sup>→</sup> S(f), the cost function f is abstract (universally quantified), but we are allowed to assume E(f), where E is initially a metavariable. So, should the need arise to prove that f satisfies a certain property, this can be done just by instantiating E. In the example of the OCaml function p (Sect. 5), we prove <sup>S</sup>(f) by induction over <sup>n</sup>, under the hypothesis n <sup>≥</sup> 0. Thus, we assume that the cost of the recursive call p(n-1) is f(n <sup>−</sup> 1), and must prove that the cost of p n is f(n). We synthesize the cost of p n as explained earlier (Sect. 5.1) and find that this cost is 1 + *if* n <sup>≤</sup> <sup>1</sup> *then* <sup>0</sup> *else* (G <sup>+</sup> f(n <sup>−</sup> 1)). We apply WeakenCost and find that our proof is complete, provided we are able to prove the following inequation:

$$1 + if \ n \le 1 \ then \ 0 \ else \ (G + f(n - 1)) \le f(n)$$

We achieve that simply by instantiating E as follows:

$$E := \lambda f. \,\forall n. \, n \ge 0 \,\,\,\,\,\,1 + if \,\,\, n \le 1 \,\,\,then \,\,0 \,\,else \,\,(G + f(n - 1)) \le \,\, f(n).$$

This is our "recurrence equation"—in fact, a universally quantified, conditional inequation. We are done with the first subgoal.

We then turn to the second subgoal, <sup>∃</sup>f.E(f)∧f g. The metavariable E is now instantiated. The goal is to solve the recurrence and analyze the asymptotic growth of the chosen solution. There are at least three approaches to solving such a recurrence.

First, one can guess a closed form that satisfies the recurrence. For example, the function <sup>f</sup> := λn. 1 + (1 +G)n<sup>+</sup> satisfies E(f) above. But, as argued earlier, guessing is in general difficult and tedious.

Second, one can invoke Cormen *et al.*'s Master Theorem [12] or the more general Akra-Bazzi theorem [1,21]. Unfortunately, at present, these theorems are not available in Coq, although an Isabelle/HOL formalization exists [13].

The last approach is Cormen *et al.*'s substitution method [12, Sect. 4]. The idea is to guess a parameterized *shape* for the solution; substitute this shape into the goal; gather a set of constraints that the parameters must satisfy for the goal to hold; finally, show that these constraints are indeed satisfiable. In the above example, as we expect the code to have linear time complexity, we propose that the solution f should have the shape λn.(an<sup>+</sup> <sup>+</sup>b), where <sup>a</sup> and <sup>b</sup> are parameters, about which we wish to gradually accumulate a set of constraints. From a formal point of view, this amounts to applying the following tautology:

$$\forall P. \forall C. \quad \left(\forall ab. \ C(a,b) \to P(\lambda n. (an^+ + b))\right) \to \left(\exists ab. \ C(a,b)\right) \to \exists f. P(f)$$

This application again yields two subgoals. During the proof of the first subgoal, C is a metavariable and can be instantiated as desired (possibly in several steps), allowing us to gather a conjunction of constraints bearing on a and b. During the proof of the second subgoal, C is fixed and we must check that it is satisfiable. In our example, the first subgoal is:

$$E(\lambda n.(an^+ + b)) \quad \land \quad \lambda n.(an^+ + b) \preceq\_{\mathbb{Z}} \lambda n.n.$$

The second conjunct is trivial. The first conjunct simplifies to:

$$\forall n. \quad n \ge 0 \to 1 + i \\ \text{if } n \le 1 \text{ then } 0 \text{ else } (G + a(n - 1)^+ + b) \le an^+ + b.$$

By distinguishing the cases n = 0, n = 1, and n > 1, we find that this property holds provided we have 1 <sup>≤</sup> b and 1 <sup>≤</sup> a <sup>+</sup> b and 1 + G <sup>≤</sup> a. Thus, we prove this subgoal by instantiating C with λ(a, b).(1 <sup>≤</sup> b <sup>∧</sup> <sup>1</sup> <sup>≤</sup> a <sup>+</sup> b <sup>∧</sup> 1 + G <sup>≤</sup> a).

There remains to check the second subgoal, that is, <sup>∃</sup>ab.C(a, b). This is easy; we pick, for instance, a := 1 + G and b := 1. This concludes our use of Cormen *et al.*'s substitution method.

In summary, by exploiting Coq's metavariables, we are able to set up our proofs in a style that closely follows the traditional paper style. During a first phase, as we analyze the code, we synthesize a cost function and (if the code is recursive) a recurrence equation. During a second phase, we guess the shape of a solution, and, as we analyze the recurrence equation, we synthesize a constraint on the parameters of the shape. During a last phase, we check that this constraint is satisfiable. In practice, instead of explicitly building and applying tautologies as above, we use the first author's procrastination library [16], which provides facilities for introducing new parameters, gradually gathering constraints on these parameters, and eventually checking that these constraints are satisfiable.

### **6 Examples**

**Binary Search.** We prove that binary search has time complexity O(log n), where n <sup>=</sup> j <sup>−</sup> i denotes the width of the search interval [i, j). The code is as in Fig. 1, except that the flaw is fixed by replacing i+1 with k+1 on the last line. As outlined earlier (Sect. 5), we synthesize the following recurrence equation on the cost function f:

$$f(0) + 3 \le f(1) \quad \land \quad \forall n \ge 0. \ 1 \le f(n) \quad \land \quad \forall n \ge 2. \ f(n/2) + 3 \le f(n)$$

We apply the substitution method and search for a solution of the form λn. *if* n <sup>≤</sup> <sup>0</sup> *then* <sup>1</sup> *else* a log n+b, which is dominated by λn. log n. Substituting this shape into the above constraints, we find that they boil down to (4 <sup>≤</sup> b)∧(0 <sup>≤</sup> a <sup>∧</sup> <sup>1</sup> <sup>≤</sup> b) <sup>∧</sup> (3 <sup>≤</sup> a). Finally, we guess a solution, namely a := 3 and b := 4.

**Dependent Nested Loops.** Many algorithms involve dependent nested for loops, that is, nested loops, where the bounds of the inner loop depend on the outer loop index, as in the following simplified example:

for i=1 to n do for j=1 to i do () done done

For this code, the cost function λn. *n <sup>i</sup>*=1(1 + *i <sup>j</sup>*=1 1) is synthesized. There remains to prove that it is dominated by λn.n<sup>2</sup>. We could recognize and prove that this function is equal to λn. *<sup>n</sup>*(*n*+3) <sup>2</sup> , which clearly is dominated by λn.n<sup>2</sup>. This works because this example is trivial, but, in general, computing explicit closed forms for summations is challenging, if at all feasible.


**A Loop Whose Body Has Exponential Cost.** In the following simple example, the loop body is just a function call:

```
for i=0 to n-1 do b(i) done
```
Thus, the cost of the loop body is not known exactly. Instead, let us assume that a specification for the auxiliary function b has been proved and that its cost is O(2*<sup>i</sup>* ), that is, cost b <sup>Z</sup> λi. <sup>2</sup>*<sup>i</sup>* holds. We then wish to prove that the cost of the whole loop is also O(2*<sup>n</sup>*).

For this loop, the cost function λn. *n <sup>i</sup>*=0(1 + cost b (i)) is automatically synthesized. We have an asymptotic bound for the cost of the loop body, namely: λi. 1 + cost b (i) <sup>Z</sup> λi. <sup>2</sup>*<sup>i</sup>* . The side conditions of the Summation lemma (Lemma 8) are met: in particular, the function λi. 1 + cost b (i) is monotonic. The lemma yields λn. *n <sup>i</sup>*=0(1 + cost b (i)) <sup>Z</sup> λn. *n <sup>i</sup>*=0 2*<sup>i</sup>* . Finally, we have λn. *n <sup>i</sup>*=0 <sup>2</sup>*<sup>i</sup>* <sup>=</sup> λn. <sup>2</sup>*<sup>n</sup>*+1 <sup>−</sup> <sup>1</sup> <sup>Z</sup> λn. <sup>2</sup>*<sup>n</sup>*.

**The Bellman-Ford Algorithm.** We verify the asymptotic complexity of an implementation of Bellman-Ford algorithm, which computes shortest paths in a weighted graph with n vertices and m edges. The algorithm involves an outer loop that is repeated n−1 times and an inner loop that iterates over all m edges. The specification asserts that the asymptotic complexity is O(nm):

$$\exists \text{cost} : \mathbb{Z}^2 \to \mathbb{Z}. \begin{cases} \text{cost} \preceq\_{\mathbb{Z}^2} \lambda(m, n). nm \\ \{\\$cost(\#edges(g), \# vertices(g))\} \text{ (be11nnford g) \{\ldots\} \end{cases}$$

By exploiting the fact that a graph without duplicate edges must satisfy m <sup>≤</sup> n<sup>2</sup>, we prove that the complexity of the algorithm, viewed as a function of n, is O(n<sup>3</sup>).

$$\exists \text{cost} : \mathbb{Z} \to \mathbb{Z}. \left\{ \begin{array}{l} \text{cost} \preceq\_{\mathbb{Z}} \lambda n. n^3 \\ \{\\$cost(\#vertices(g))\} \left(\mathsf{bel1manford} \, g\right) \{\ldots\} \right\} \right.$$

To prove that the former specification implies the latter, one instantiates m with <sup>n</sup><sup>2</sup>, that is, one exploits a composition lemma (Sect. 3.4). In practice, we publish both specifications and let clients use whichever one is more convenient.

**Union-Find.** Chargu´eraud and Pottier [11] use Separation Logic with Time Credits to verify the correctness and time complexity of a Union-Find implementation. For instance, they prove that the (amortized) concrete cost of find is 2α(n) + 4, where n is the number of elements. With a few lines of proof, we derive a specification where the cost of find is expressed under the form O(α(n)):

```
specO Z_filterType Z.le (fun n ⇒ alpha n) (fun cost ⇒
     ∀DRVx, x \in D → triple (UnionFind_ml.find x)
       PRE (UF D R V -
                      $(cost (card D)))
       POST (fun y ⇒ UF D R V -
                                [ R x = y ])).
```
Union-Find is a mutable data structure, whose state is described by the abstract predicate UF D R V. In particular, the parameter D represents the domain of the data structure, that is, the set of all elements created so far. Thus, its cardinal, card D, corresponds to n. This case study illustrates a situation where the cost of an operation depends on the current state of a mutable data structure.

### **7 Related Work**

Our work builds on top of Separation Logic [23] with Time Credits [2], which has been first implemented in a verification tool and exploited by the second and third authors [11]. We refer the reader to their paper for a survey of the related work in the general area of formal reasoning about program complexity, including approaches based on deductive program verification and approaches based on automatic complexity analysis. In this section, we restrict our attention to informal and formal treatments of the O notation.

The O notation and its siblings are documented in several textbooks [7,15,20]. Out of these, only Howell [19,20] draws attention to the subtleties of the multivariate case. He shows that one cannot take for granted that the properties of the O notation, which in the univariate case are well-known, remain valid in the multivariate case. He states several properties which, at first sight, seem natural and desirable, then proceeds to show that they are inconsistent, so no definition of the O notation can satisfy them all. He then proposes a candidate notion of domination between functions whose domain is <sup>N</sup>*<sup>k</sup>*. His notation, <sup>f</sup> <sup>∈</sup> <sup>O</sup>ˆ(g), is defined as the conjunction of f <sup>∈</sup> O(g) and <sup>ˆ</sup>f <sup>∈</sup> O(ˆg), where the function <sup>ˆ</sup>f is a "running maximum" of the function f, and is by construction monotonic. He shows that this notion satisfies all the desired properties, provided some of them are restricted by additional side conditions, such as monotonicity requirements.

In this work, we go slightly further than Howell, in that we consider functions whose domain is an arbitrary filtered type A, rather than necessarily <sup>N</sup>*k*. We give a standard definition of O and verify all of Howell's properties, again restricted with certain side conditions. We find that we do not need Oˆ, which is fortunate, as it seems difficult to define <sup>ˆ</sup>f in the general case where f is a function of domain A. The monotonicity requirements that we impose are not exactly the same as Howell's, but we believe that the details of these administrative conditions do not matter much, as all of the functions that we manipulate in practice are everywhere nonnegative and monotonic.

Avigad and Donnelly [3] formalize the O notation in Isabelle/HOL. They consider functions of type A <sup>→</sup> B, where A is arbitrary and B is an ordered ring. Their definition of "f <sup>=</sup> O(g)" requires <sup>|</sup>f(x)| ≤ c|g(x)<sup>|</sup> for every x, as opposed to "when x is large enough". Thus, they get away without equipping the type A with a filter. The price to pay is an overly restrictive notion of domination, except in the case where A is <sup>N</sup>, where both <sup>∀</sup>x and <sup>U</sup>x yield the same notion of domination—this is Brassard and Bratley's "threshold rule" [7]. Avigad and Donnelly suggest defining "f <sup>=</sup> O(g) eventually" as an abbreviation for <sup>∃</sup>f ,(f <sup>=</sup> O(g) <sup>∧</sup> <sup>U</sup>x.f(x) = f (x)). In our eyes, this is less elegant than parameterizing O with a filter in the first place.

Eberl [13] formalizes the Akra-Bazzi method [1,21], a generalization of the well-known Master Theorem [12], in Isabelle/HOL. He creates a library of Landau symbols specifically for this purpose. Although his paper does not mention filters, his library in fact relies on filters, whose definition appears in Isabelle's Complex library. Eberl's definition of the O symbol is identical to ours. That said, because he is concerned with functions of type <sup>N</sup> <sup>→</sup> <sup>R</sup> or <sup>R</sup> <sup>→</sup> <sup>R</sup>, he does not define product filters, and does not prove any lemmas about domination in the multivariate case. Eberl sets up a decision procedure for domination goals, like x <sup>∈</sup> O(x<sup>3</sup>), as well as a procedure that can simplify, say, <sup>O</sup>(x<sup>3</sup>+x<sup>2</sup>) to <sup>O</sup>(x<sup>3</sup>).

TiML [25] is a functional programming language where types carry time complexity annotations. Its type-checker generates proof obligations that are discharged by an SMT solver. The core type system, whose metatheory is formalized in Coq, employs concrete cost functions. The TiML implementation allows associating a O specification with each toplevel function. An unverified component recognizes certain classes of recurrence equations and automatically applies the Master Theorem. For instance, *mergesort* is recognized to be O(mn log n), where n is the input size and m is the cost of a comparison. The meaning of the O notation in the multivariate case is not spelled out; in particular, which filter is meant is not specified.

Boldo *et al.* [4] use Coq to verify the correctness of a C program which implements a numerical scheme for the resolution of the one-dimensional acoustic wave equation. They define an ad hoc notion of "uniform O" for functions of type <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup>, which we believe can in fact be viewed as an instance of our generic definition of domination, at an appropriate product filter. Subsequent work on the Coquelicot library for real analysis [5] includes general definitions of filters, limits, little-o and asymptotic equivalence. A few definitions and lemmas in Coquelicot are identical to ours, but the focus in Coquelicot is on various filters on R, whereas we are more interested in filters on Z*k*.

The tools RAML [17] and Pastis [8] perform fully automated amortized time complexity analysis of OCaml programs. They can be understood in terms of Separation Logic with Time Credits, under the constraint that the number of credits that exist at each program point must be expressed as a polynomial over the variables in scope at this point. The a priori unknown coefficients of this polynomial are determined by an LP solver. Pastis produces a proof certificate that can be checked by Coq, so the trusted computing base of this approach is about the same as ours. RAML and Pastis offer much stronger automation than our approach, but have weaker expressive power. It would be very interesting to offer access to a Pastis-like automated system within our interactive system.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Verified Learning Without Regret From Algorithmic Game Theory to Distributed Systems with Mechanized Complexity Guarantees**

Samuel Merten(B) , Alexander Bagnall , and Gordon Stewart

Ohio University, Athens, OH, USA *{*sm137907,ab667712,gstewart*}*@ohio.edu

**Abstract.** Multiplicative Weights (MW) is a simple yet powerful algorithm for learning linear classifiers, for ensemble learning `a la boosting, for approximately solving linear and semidefinite systems, for computing approximate solutions to multicommodity flow problems, and for online convex optimization, among other applications. Recent work in algorithmic game theory, which applies a computational perspective to the design and analysis of systems with mutually competitive actors, has shown that no-regret algorithms like MW naturally drive games toward approximate Coarse Correlated Equilibria (CCEs), and that for certain games, approximate CCEs have bounded cost with respect to the optimal states of such systems.

In this paper, we put such results to practice by building distributed systems such as routers and load balancers with performance and convergence guarantees mechanically verified in Coq. The main contributions on which our results rest are (1) the first mechanically verified implementation of Multiplicative Weights (specifically, we show that our MW is no regret) and (2) a language-based formulation, in the form of a DSL, of the class of games satisfying Roughgarden smoothness, a broad characterization of those games whose approximate CCEs have cost bounded with respect to optimal. Composing (1) with (2) within Coq yields a new strategy for building distributed systems with mechanically verified complexity guarantees on the time to convergence to near-optimal system configurations.

**Keywords:** Multiplicative weights · Algorithmic game theory Smooth games · Interactive theorem proving · Coq

### **1 Introduction**

The Multiplicative Weights algorithm (MW, [1,25]) solves the general problem of "combining expert advice", in which an agent repeatedly chooses which action, or "expert", to play against an adaptive environment. The agent, after playing an action, learns from the environment both the cost of that action and of other actions it could have played in that round. The environment, in turn, may adapt in order to minimize environment costs. MW works by maintaining a weighted distribution over the action space, in which each action initially has equal weight, and by updating weights with a linear or exponential loss function to penalize poorly performing actions.

MW is a *no-regret* algorithm: its expected cost approaches that of the best fixed action the agent could have chosen in hindsight (i.e., external regret tends to zero) as time <sup>t</sup> → ∞. Moreover, this simple algorithm performs remarkably well: in number of rounds logarithmic in the size of the action space, MW's expected regret can be bounded by a small constant - (MW has bounded external regret). In [1], Arora, Hazan, and Kale showed that MW has wide-ranging connections to numerous problems in computer science, including optimization, linear and semidefinite programming, and machine learning (cf. boosting [14]).

Our work targets another important application of MW: the approximate solution of multi-agent games, especially as such games relate to the construction of distributed systems. It is well known (cf. [30, Chapter 4]) that no-regret algorithms such as MW converge, when played by multiple independent agents, to a large equilibrium class known as Coarse Correlated Equilibria (CCEs). CCEs may not be socially optimal, but for some games, such as Roughgarden's smooth games [35], the social cost of such equilibrium states can be bounded by a constant factor of the optimal cost of the game (the game has bounded Price of Anarchy, or POA). Therefore, to drive the social cost of a smooth game to near optimal, it suffices simply to let each agent play a no-regret algorithm such as MW.

Moreover, a number of distributed systems can be encoded as games, especially when the task being distributed is viewed as an optimization problem. Consider, for example, distributed balancing of network flows over a set of web servers, an application we return to in Sect. 3. Assuming the set of flows is fixed, and that the cost of (or latency incurred by) assigning a flow to a particular web server increases as a function of the number of flows already assigned to that server (the traffic), then the load balancing application is encodable as a game in which each flow is a "player" attempting to optimize its cost (latency). An optimal solution of this game minimizes the total latency across all flows. Since the game is Roughgarden smooth (assuming affine cost functions), the social cost of its CCEs as induced by letting each player independently run MW is bounded with respect to that of an optimal solution.

### **1.1 Contributions**

In this paper, we put such results to work by building the first verified implementation of the MW algorithm – which we use to drive all games to approximate CCEs – and by defining a language-based characterization of a subclass of games called Roughgarden smooth games that have robust Price of Anarchy guarantees extending even to approximate CCEs. Combining our verified MW with smooth games, we construct distributed systems for applications such as routing and load balancing that have verified convergence and correctness guarantees.

Specifically, our main contributions are:


By *verified*, we mean our MW implementation has mechanically checked convergence bounds and proof of correctness within an interactive theorem prover (specifically, Ssreflect [16], an extension of the Coq [5] system). By *convergence* and *correctness*, we mean that we prove both that MW produces the right answer (functional correctness with respect to a high-level functional specification), but also that it does so with external regret<sup>1</sup> bounded by a function of the number of iterations of the protocol (convergence). Convergence of MW in turn implies convergence to an approximate CCE. By composing this second convergence property with Roughgarden smoothness, we bound the social, or total, cost of the resulting system state with respect to the optimal.

As we've mentioned, MW has broad application across a number of subdisciplines of computer science, including linear programming, optimization, and machine learning. Although our focus in this paper is the use of MW to implement no-regret dynamics, a general strategy for computing the CCEs of multiagent games, our implementation of MW (Sect. 5.3) could be used to build, e.g., a verified LP solver or verified implementation of boosting as well.

*Limitations.* The approach we outline above does not apply to all distributed systems, nor even to all distributed systems encodable as games. In particular, in order to prove POA guarantees in our approach, the game encoding a particular distributed system must first be shown Roughgarden smooth, a condition which does not always apply (e.g., to network formation games [35, Section 2]). More positively, the Smooth Games DSL we present in Sects. 3 and 4 provides one method by which to explore the combinatorial nature of Roughgarden smoothness, as we demonstrate with some examples in Sect. 3.

*Relationship to Prior Work.* Some of the ideas we present in this paper previously appeared in summary form in a 3-page brief announcement at PODC 2017 [4]. The current paper significantly expands on the architecture of the Cage system, our verified implementation of Multiplicative Weights, the definition of the Smooth Games DSL, and the composition theorems of Sect. 6 proving that the pieces fit together to imply system-wide convergence and quality bounds.

<sup>1</sup> The expected (per-step) cost of the algorithm minus that of the best fixed action.

### **1.2 Organization**

The following section provides background on games, algorithmic game theory, and smoothness. Section 3 presents an overview of the main components of the Cage approach, via application to examples. Section <sup>4</sup> provides more detail on the combinators of our Smooth Games DSL. Section 5 presents our verified implementation of MW. Section 6 describes the composition theorems proving that multi-agent MW converges to near-optimal --CCEs. Sections 7 and 8 present related work and conclude.

### **2 Background**

### **2.1 Games**

Von Neumann, Morgenstern, and Nash [28,29] (in the US) and Bachelier, Borel, and Zermelo [3,8,43] (in Europe) were the first to study the mathematical theory of strategic interaction, modern game theory. Nash's famous result [27] showed that in all finite games, mixed-strategy equilibria (those in which players are allowed to randomize) always exist. Since the 1950s, game theory has had huge influence in numerous fields, especially economics.

In our context, a game is a tuple of a finite type A (the strategy space) and a cost function <sup>C</sup><sup>i</sup> mapping tuples of strategies of type <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> <sup>×</sup> ... <sup>×</sup> <sup>A</sup><sup>N</sup> to values of type R, the cost to player i of state (a1,...,ai,...,a<sup>N</sup> ). For readers interested in formalization-related aspects, Listing 1 provides additional details.

### **Listing 1: Games in Ssreflect-Coq**

In Ssreflect-Coq, an extension of the standard Coq system, a finite type A : finType pairs the type A with an enumerator enum : list A such that for all a : A, count a enum = 1 (every element is included exactly once). To define games, we use operational type classes [38], which facilitate parameter sharing:

**Class** game (A : finType) (N : nat) (R : realFieldType) '(costClass : CostClass N R A) : Type -{}.

costClass declares the cost function Ci, and N is the number of players.

A state <sup>s</sup> : <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> <sup>×</sup> ... <sup>×</sup> <sup>A</sup><sup>N</sup> is a *Pure Nash Equilibrium (PNE)* when no player <sup>i</sup> <sup>∈</sup> [1, N] has incentive to change its strategy: <sup>∀</sup>s <sup>i</sup>. Ci(s) <sup>≤</sup> <sup>C</sup>i(s <sup>i</sup>, s−<sup>i</sup>). Here s <sup>i</sup> is an arbitrary strategy. Strategy <sup>s</sup><sup>i</sup> is player <sup>i</sup>'s move in state <sup>s</sup>. By s <sup>i</sup>, s−<sup>i</sup> we denote the state in which player <sup>i</sup>'s strategy is <sup>s</sup> <sup>i</sup> and all other players play s. In other words, no player can decrease its cost by unilateral deviation.

Pure-strategy Nash equilibria do not always exist. Mixed Nash Equilibria (MNE), which *do* exist in all finite games, permit players to randomize over the strategy space, by playing a distribution σ<sup>i</sup> over A. The overall state is the product distribution over the player distributions. Every PNE is trivially an MNE, by letting players choose deterministic distributions σi.

Correlated Equilibria (CEs) generalize MNEs to situations in which players coordinate via a trusted third party. In what follows, we'll mostly be interested in a generalization of CEs, called *Coarse Correlated Equilibria (CCEs)*, and their approximate relaxations. Specifically, a distribution σ over A<sup>N</sup> (Listing 2) is a CCE when <sup>∀</sup>i∀s <sup>i</sup>. <sup>E</sup>s∼σ[Ci(s)] <sup>≤</sup> <sup>E</sup>s∼σ[Ci(s <sup>i</sup>, s−i)]. <sup>E</sup>s∼σ[Ci(s)] is the expected cost to player i in distribution σ. The CCE condition states that there is no s <sup>i</sup> that could decrease player <sup>i</sup>'s expected cost. CCEs are essentially a relaxation of MNEs which do not require σ to be a product distribution (i.e., the players' strategies may be correlated). CEs are a subclass of CCEs in which <sup>E</sup><sup>s</sup>∼<sup>σ</sup>[Ci(s <sup>i</sup>, s−<sup>i</sup>)] may be conditioned on <sup>s</sup>i.

A distribution σ over states may only be *approximately* a CCE. Define as - approximate those CCEs <sup>σ</sup> for which <sup>∀</sup>i∀s . <sup>E</sup><sup>s</sup>∼<sup>σ</sup>[Ci(s)] <sup>≤</sup> <sup>E</sup><sup>s</sup>∼<sup>σ</sup>[Ci(s <sup>i</sup>, s−<sup>i</sup>)] + -. Moving to s <sup>i</sup> can decrease player <sup>i</sup>'s expected cost, but only by at most -.

### **Listing 2: Discrete Distributions in Ssreflect-Coq**

Since our games A are finite, discrete distributions suffice to formalize MNEs, CEs, and CCEs. We model such distributions as finite functions (those with finite domain) from the strategy space A to R:

**Record** dist (A : finType) : Type - mkDist { pmf :<sup>&</sup>gt; {ffun <sup>A</sup> <sup>→</sup> <sup>R</sup>}; dist ax : dist axiom pmf }.

Here {ffun <sup>A</sup> <sup>→</sup> <sup>R</sup>} is Ssreflect syntax for the type of finite functions from <sup>A</sup> to R. The second projection of the record, dist ax, asserts that pmf represents a valid distribution: pmf is positive and - <sup>a</sup>:<sup>A</sup> pmf <sup>a</sup> = 1. The Coq predicate eCCE:

```
Definition eCCE (-
                    : R) (σ : dist AN ) : Prop -

  ∀(i : [0..N − 1]) (s : A),
  expectedCost i σ ≤ (expectedUnilateralCost i σ s
                                                     ) + -
                                                          .
```
states that distribution σ (over N-tuples of strategies A, one per player) is an --approximate CCE, or --CCE.

### **2.2 Algorithmic Game Theory**

Equilibria are only useful if we're able to quantify, with respect to the game being analyzed:


Algorithmic game theory and the related fields of mechanism design and distributed optimization provide excellent tools here.

*Good Equilibria.* The *Price of Anarchy*, or POA, of game (A, C) quantifies the cost of equilibrium states of (A, C) with respect to optimal configurations. Precisely, define POA as the ratio of the social cost of the worst equilibrium s to the social cost of an optimal state s∗. POA near 1 indicates high-quality equilibria: finding an equilibrium in such a game leads to overall social cost close to optimal. Prior work in algorithmic game theory has established nontrivial POA bounds for a number of game classes: on various classes of congestion and routing games [2,6,10], on facility location games [40], and others [11,32].

In the system of Sect. 3, we use the related concept of *Roughgarden smooth games* [35], or simply *smooth games*, which define a subclass of games with canonical POA proofs. To each smooth game are associated two constants, λ and μ. The precise definition of the smoothness condition is less relevant here than its consequences: if a cost-minimization game is (λ, μ)-smooth, then it has POA λ/(1−μ). Not all games are smooth, but for those that are, the POA bound above extends even to CCEs and their approximations, a particularly large (and therefore tractable) class of equilibria [35, Sects. 3 and 4].

*Tractable Dynamics.* Good equilibrium bounds are most useful when we know how quickly a particular game converges to equilibrium [7,9,12,13,17]. Certain classes of games, e.g. potential games [26], reach equilibria under a simple model of dynamics called best response. As we've mentioned, we use a different distributed learning algorithm in this work, variously called Multiplicative Weights (MW) [1] or sometimes Randomized Weighted Majority [25], which drives *all* games to CCEs, a larger class of equilibrium states than those achieved by potential games under best response.

### **3 Cage by Example**

No-regret algorithms such as MW can be used to drive multi-agent systems toward the --CCEs of arbitrary games. Although the CCEs of general games may have high social cost, those of *smooth* games, as identified by Roughgarden [35], have robust Price of Anarchy (POA) bounds that extend even to --CCEs. Figure 1 depicts how these pieces fit together in the highlevel architecture of our Cage system, which formalizes the results of Sect. <sup>2</sup> in Coq. Shaded boxes are program-related components while white boxes are proof related.

### **3.1 Overview**

At the top, we have a domain-specific language in Coq (DSL, box 1) that generates games with automatically verified POA bounds. To execute such games, we have verified (also in Coq) an implementation of the Multiplicative Weights algorithm (MW, 2). Correctness of MW implies convergence bounds on the games it executes: <sup>O</sup>((ln <sup>|</sup>A|)/-<sup>2</sup>) iterations suffice to drive the game to an --CCE (here, <sup>|</sup>A<sup>|</sup> is the size of the action space, or game type, <sup>A</sup>).

**Fig. 1.** System architecture

We compose N instances of multiplicative weights (4), one per agent, with a server (3) that facilitates communication, implemented in OCaml and modeled by an operational semantics in Coq. To actually execute games, we use Coq's code extraction mechanism to generate OCaml code that runs clients against the server, using an unverified OCaml shim to send and receive messages. We prove performance guarantees in Coq from POA bounds on the game and from the regret bound on MW.

#### **3.2 Smooth Games DSL**

The combinators exposed by the Smooth Games DSL operate over game types A, cost functions C, and smoothness parameters λ and μ. Basic combinators in this language include (i) Resource and (ii) Unit games, the first for coordinating access to shared resources under congestion and the second with fixed cost 0. Combinators that take other games as arguments include:


– the singleton game Singleton(A), which has cost 1 if if player i "uses" the underlying resource (BResource(f i) = true), and 0 otherwise. The function <sup>B</sup>−(−) generalizes the notion of resource usage beyond the primitive Resource game. For example, BScalar(A,m)(x) = BA(x): usage in a game built from the scalar combinator reduces to usage in the underlying game.

### **3.3 Example: Distributed Routing**

We illustrate the Smooth Games DSL with an example: distributed routing over networks with affine latency functions (Fig. 2). This game is known to have POA 5/2 [35].

In a simple version of the game, N routing agents each choose a path from a global source vertex s to a global sink vertex t. Latency over edge e, modeled by an affine cost function ce(x) = aex + be, scales in the amount of traffic x over that edge. An optimal solution minimizes the total cost to all agents.

We model each link in the network as a Resource game, which in its most basic form is defined by the following inductive datatype:

**Inductive** Resource : Type - | RYes : Resource | RNo : Resource.

RYes indicates the agent chose to use the resource (a particular edge) and RNo otherwise. The cost function for Resource is defined by:

**Definition** ResourceCostFun (<sup>i</sup> : [0..<sup>N</sup> <sup>−</sup> <sup>1</sup>]) (<sup>s</sup> : [0..<sup>N</sup> <sup>−</sup> <sup>1</sup>] <sup>→</sup>fin Resource) : <sup>R</sup> - **if** s<sup>i</sup> **is** RYes **then** traffic s **else** 0.

in which s is a map from agent labels to resource strategies and traffic s is the total number of agents that chose to use resource s. An agent pays traffic s if it uses the resource, otherwise 0. We implement Resource as a distinct inductive type, even though it's isomorphic to bool, to ensure that types in the Smooth Games DSL have unique game instances. To give each resource the more interesting cost function ce(x) = aex + be, we compose Resource with a second combinator, Affine(ae, be, Resource), which has cost 0 if an agent does not use the resource, and cost <sup>a</sup>e∗(traffic <sup>s</sup>)+ <sup>b</sup><sup>e</sup> otherwise. This combinator preserves (λ, μ) smoothness assuming <sup>λ</sup>+<sup>μ</sup> <sup>≥</sup> 1, a side condition which holds for Resource games.

We encode m affine resources by applying Affine to Resource m times, then folding under product:

```
T -
   Affine(a1,b1,Resource)
  × Affine(a2,b2,Resource)
  × ...
  × Affine(am,bm,Resource)
```
The associated cost function is the sum of the individual resource cost functions.

Values of type T may assign RYes to a subset of resources that doesn't correspond to a valid path in a graph G = (V,E). To prevent this behavior, we apply to T the subtype combinator Σ, specialized to a predicate isValidPath(G, s, t) enforcing that strategies (r1, <sup>r</sup>2, ... , <sup>r</sup>|E|) correspond to valid paths from <sup>s</sup> to t: T' - ΣisValidPath(G,s,t)(T). The game T' is (5/3, 1/3)-smooth, just like the underlying Resource game, which implies POA of (5/3)/(1 – 1/3) = 5/2.

#### **3.4 Example: Load Balancing**

As a second example, consider the load balancing game depicted in Fig. 3, in which a number of network flows are distributed over several servers with affine cost functions. In general, N load balancing agents are responsible for distributing M flows over K servers. The cost of allocating a flow to a server is modeled by an affine cost function which scales in the total load (number of flows) on that server. Like routing, the load balancing game has POA 5/2. This is no coincidence; both are special cases of "finite congestion games", a class of games which have POA 5/2 when costs

**Fig. 3.** Load balancing game

are linear [10]. The connection between them can be seen more concretely by observing that they are built up from the same primitive Resource game.

We model the system as an NM-player K-resource game in which each player corresponds to a single network flow. Each load balancing agent poses as multiple players (MW instances) in the game, one per flow, and composes the actions chosen by these players to form its overall strategy. The result of running the game is an approximate CCE with respect to the distribution of flows over servers.

Each server is defined as a Resource with an affine cost function, using the same data type and cost function as in the routing example. Instead of isValidPath, we use a new predicate exactlyOne to ensure that each network flow is assigned to exactly one server.

### **4 Smooth Games**

Roughgarden smoothness [35] characterizes a subclass of games with canonical Price of Anarchy (POA) proofs. In [35], Roughgarden showed that smooth games have canonical POA bounds not only with respect to pure Nash equilibria but also with respect to mixed Nash equilibria, correlated equilibra, CCEs, and their approximate relaxations. In the context of Cage, we use smoothness to bound the social cost of games executed by multiple clients each running MW. We show how the technical pieces fit together, in the form of bounds on an operational semantics of the entire Cage system, in Sect. 6. This section introduces the technical definition of smoothness and the language of combinators, Syntax

*Scalars* m, b; *Predicates* P *Game types* A, B ::= Resource <sup>|</sup> Unit <sup>|</sup> Bias(A, b) <sup>|</sup> Scalar(A, m) <sup>|</sup> <sup>A</sup> <sup>×</sup> <sup>B</sup> | {<sup>x</sup> : A, P(x)} | Singleton(A) Judgment -(λ,μ) (A, C) read "Game (A, C) is (λ, μ)-smooth." -( 5 <sup>3</sup> , <sup>1</sup> <sup>3</sup> ) (Resource, ResourceCostFun) ResourceSmooth -(1,0) (Unit, fun i f. 0) UnitSmooth -(λ,μ) (A, C) -(1,0) (Singleton(A), fun i f. if <sup>B</sup>A(f i) then <sup>1</sup> else 0) SingletonSmooth -(λ,μ) (A, C) -(λ,μ) ({<sup>x</sup> : A, P(x)}, fun i f. C<sup>i</sup> (fun j. (f j).1)) SigmaSmooth -(λ,μ) (A, C) 1 <sup>≤</sup> <sup>λ</sup> <sup>+</sup> <sup>μ</sup> <sup>0</sup> <sup>≤</sup> <sup>b</sup> -(λ,μ) (Bias(A, b), fun i f. C<sup>i</sup> <sup>f</sup> <sup>+</sup> <sup>b</sup>) BiasSmooth -(λ,μ) (A, C) 0 <sup>≤</sup> <sup>m</sup> -(λ,μ) (Scalar(A, m), fun i f. m <sup>∗</sup> <sup>C</sup><sup>i</sup> <sup>f</sup>) ScalarSmooth -(λ*A*,μ*A*) (A, C<sup>A</sup>) -(λ*B*,μ*B*) (B,C<sup>B</sup>) -(max(λ*A*,λ*B*),max(μ*A*,μ*B*)) (<sup>A</sup> <sup>×</sup> B, fun i f. C<sup>A</sup> <sup>i</sup> f + C<sup>B</sup> <sup>i</sup> <sup>f</sup>) ProductSmooth

**Fig. 4.** Smooth games DSL

or Smooth Games DSL of Sect. 3, that we use to build games that are smooth by construction.

**Definition 1 (Smoothness).** *A game* (A, C) *is* (λ, μ)*-smooth if for any two states* s, s<sup>∗</sup> : A<sup>N</sup> *, the following inequality holds:*

$$\sum\_{i=1}^{k} C\_i(s\_i^\*, s\_{-i}) \le \lambda \cdot C(s^\*) + \mu \cdot C(s).$$

Here, Ci(s<sup>∗</sup> <sup>i</sup> , s−<sup>i</sup>) denotes the individual cost to player <sup>i</sup> in the mixed state where all other players follow their strategies from s, while player i follows the corresponding strategy from s∗. Smooth games bound the individual cost of players' unilateral deviations from state s to s<sup>∗</sup> by the weighted social costs of s and s∗. In essence, when λ and μ are small, the effect of any single player's deviation from a given state has minimal effect.

The smoothness inequality leads to natural proofs of POA for a variety of equilibrium classes. As an example, consider the following bound on the expected cost of --CCEs of (λ, μ)-smooth games:

**Lemma** smooth eCCE (d : dist (state N T)) (s : state N T) (- : R) : eCCE <sup>d</sup> <sup>→</sup> optimal <sup>s</sup> <sup>→</sup> ExpectedCost <sup>d</sup> <sup>≤</sup> <sup>λ</sup>∗(Cost <sup>s</sup> ) + <sup>μ</sup>∗(ExpectedCost <sup>d</sup>) + <sup>N</sup>∗-.

ExpectedCost d is the sum for all players i of the expected cost to player i of distribution d. N is the number of players in the game.

The smooth eCCE bound implies the following Price of Anarchy bound on the expected cost, summed across all players, of distribution d:

$$\begin{array}{l} \mathsf{Lemma } \mathsf{smooth\\_POA} \ \epsilon \ \{d: \mathsf{dist} \ \{\mathsf{state} \ N \ T\}\} \ s':\\ \mathsf{eCCE} \ \epsilon \ d \rightarrow \mathsf{optimal} \ s' \rightarrow \\ \mathsf{ExpectedCost} \ d \leq \lambda/\langle 1-\mu \rangle \* \langle \mathsf{Cost} \ s' \rangle + \langle N\*\epsilon \rangle/\langle 1-\mu \rangle. \end{array}$$

If d is an --CCE, then its cost is no more than λ/(1 - μ) times the optimal cost of s , plus an additional term that scales in the number of players N. For example, for concrete values λ = 5/3, μ = 1/3, - = 0.0375, and N = 5, we get multiplicative approximation factor λ/(1 <sup>−</sup> <sup>μ</sup>)=5/2 and additive factor 0.28. A value of - = 0.0375 is reasonable; as Sect. 5 will show, it takes fewer than 20, 000 iterations of the Multiplicative Weights algorithm, in a game with strategy space of size 1000, to produce -<sup>≤</sup> <sup>0</sup>.0375.

### **4.1 Combinators**

Figure 4 lists the syntax and combinators of the Smooth Games DSL we used in Sect. 3 to build smooth routing and load balancing games.

The smoothness proof accompanying the judgment of Resource games is the least intuitive, and provides some insight into the behavior of smooth games. The structure of our proof borrows from a stronger result given by Roughgarden [35]: smoothness for resource games with affine cost functions and multiple resources. The key step is the following inequality first noted by Christodoulou and Koutsoupias [10]:

$$y(z+1) \le \frac{5}{3}y^2 + \frac{1}{3}z^2$$

for non-negative integers y and z. We derive ( <sup>5</sup> 3 , 1 <sup>3</sup> )-smoothness of Resource games from the following inequalities:

$$\sum\_{i=0}^{N-1} C\_i(s\_i^\*, s\_{-i}) \le (\text{traffic } s^\*) \cdot (\text{traffic } s + 1) \tag{1}$$

$$(\texttt{traffic } s^\*) \cdot (\texttt{traffic } s + 1) \le \frac{5}{3} \cdot (\texttt{traffic } s^\*)^2 + \frac{1}{3} \cdot (\texttt{traffic } s)^2 \tag{2}$$

$$(\texttt{traffic } s^\*) \cdot (\texttt{traffic } s + 1) \le \frac{5}{3} \cdot C(s^\*) + \frac{1}{3} \cdot C(s) \tag{3}$$

$$\sum\_{i=0}^{N-1} C\_i(s\_i^\*, s\_{-i}) \le \frac{5}{3} \cdot C(s^\*) + \frac{1}{3} \mu \cdot C(s) \tag{4}$$

The inequality in step 1 is due to the fact that the cost per player in state s<sup>∗</sup> is at most traffic s + 1, and there are exactly traffic s<sup>∗</sup> players incurring such cost. I.e., (traffic <sup>s</sup>∗) · (traffic <sup>s</sup> + 1) is the number of nonzero terms times the upper bound on each term. The substitution in step 3 comes from the fact that in any state s, C(s)=(traffic s)<sup>2</sup>; each of the m players using the resource incur cost m.

The proofs of smoothness for other combinators are straightforward. For example, since Unit games always have cost 0, all values of λ and μ satisfy the smoothness inequality: 0 <sup>≤</sup> <sup>λ</sup> · 0 + <sup>μ</sup> · 0. We restrict the range of the cost function in SingletonSmooth games to {0, <sup>1</sup>} by applying the function <sup>B</sup>A(·), which generalizes the notion of "using a resource" to all the game types of Fig. 4. Smoothness of the Singleton game follows by case analysis on the results of <sup>B</sup>A(·) in the states <sup>s</sup> and <sup>s</sup><sup>∗</sup> of the smoothness inequality. The games produced by the SigmaSmooth combinator have costs equal to those of the underlying games but restrict the domain to those states satisfying a predicate P. Since smoothness of the underlying bound holds for all states in A, the same bound holds of the restricted domain of states <sup>a</sup> <sup>∈</sup> <sup>A</sup> drawn from <sup>P</sup>. Smoothness of product games relies on the fact that smoothness still holds if λ and μ are replaced with larger values. Thus, each of the argument games to ProductSmooth is (max(λA, λB), max(μA, μB))-smooth. The overall product game, which sums the costs of its argument games, is (max(λA, λB), max(μA, μB))-smooth as well.

It's possible to derive combinators from those defined in Fig. 4. For example, define as Affine(m, b, A) the game with cost function mx + b. We implement this game as {<sup>p</sup> : Scalar(m, A) <sup>×</sup> Scalar(b, Singleton(A)), p.1 = p.2}, or the subset of product games over the scalar game Scalar(m, A) and the {0, <sup>1</sup>} scalar game over b such that the first and second projections of each strategy p are equal.

### **5 Multiplicative Weights (MW)**

At the heart of the Cage architecture of Sect. <sup>3</sup> lies our verified implementation of the Multiplicative Weights algorithm. In this section, we present the details of the algorithm and sketch its convergence proof. Section 5.3 presents our verified MW implementation and mechanized proof of convergence.

$$\begin{array}{l} \text{For all } a \in A, \text{ client initialized } w\_1(a) = 1. \\\\ \text{For time } t \in [1 \ldots T]: \\\\ \text{Let } I\_t \triangleq \sum\_{a \in A} w\_t(a). \\ \text{Play strategy } p\_t(a) = w\_t(a)/I\_t. \\\\ \text{Update weights } w\_{t+1}(a) \triangleq w\_t(a) \* (1 - \eta \* c\_t(a)) \end{array} \Bigg| \begin{array}{l} Envroomment \\\\ \text{Ravionment} \\\\ \text{Choose cost vector } c\_t. \end{array}$$

**Fig. 5.** Multiplicative Weights (MW)

#### **5.1 The Algorithm**

The MW algorithm (Fig. 5) pits a client, or agent, against an adaptive environment. The agent maintains a weight distribution w over the action space, initialized to give each action equal weight. At each time step <sup>t</sup> <sup>∈</sup> [1 ...T], the agent commits to the distribution wt/ - <sup>a</sup>∈<sup>A</sup> <sup>w</sup>t(a), communicating this mixed strategy to the environment. After receiving a cost vector c<sup>t</sup> from the environment, the agent updates its weights wt+1 to penalize high-cost actions, at a rate determined by a learning constant <sup>η</sup> <sup>∈</sup> (0, <sup>1</sup>/2]. High <sup>η</sup> close to 1/2 leads to higher penalties, and thus relatively less exploration of the action space.

The environment is typically adaptive, and may be implemented by a number of other agents also running instances of MW. The algorithm proceeds for a fixed number of epochs T, or until some bound on expected external regret (expected cost minus the cost of the best fixed action) is achieved. In what follows, we always assume that costs lie in the range [−1, 1]. Costs in an arbitrary but bounded range are also possible (with a concomitant relaxation of the algorithm's regret bounds), as are variations of MW to solve payoff maximization instead of cost minimization.

#### **5.2 MW Is No Regret**

The MW algorithm converges reasonably quickly: To achieve expected regret at most -, it's sufficient to run the algorithm <sup>O</sup>((ln <sup>|</sup>A|)/-<sup>2</sup>) iterations, where <sup>|</sup>A<sup>|</sup> is the size of the action space [36, Chapter 17]. Regret can be driven arbitrarily small as the number of iterations approaches infinity. Bounded regret suffices to prove convergence to an approximate CCE, as [36] also shows.

In this section, we present a high-level sketch of the proof that MW is no regret. We follow [36, Chapter 17], which has additional details. At the level of the mathematics, our formal proof makes no significant departures from Roughgarden.

**Definition 2 (Per-Step External Regret).** *Let* a<sup>∗</sup> *be the best fixed action in hindsight (i.e., the action with minimum cost given the cost vectors received from the environment) and let OPT* - -T <sup>t</sup>=1 <sup>c</sup>t(a∗)*. The expected per-step external regret of MW is*

$$\left(\sum\_{t=1}^{T} \zeta\_t - OPT\right) \;/\; T.$$

The summed term defines the cumulative expected cost of the algorithm for time <sup>t</sup> <sup>∈</sup> [1 ...T], where by <sup>ζ</sup><sup>t</sup> we denote the expected cost at time <sup>t</sup>:

$$\zeta\_t = \sum\_{a \in A} p\_t(a) \cdot c\_t(a) = \sum\_{a \in A} \frac{w\_t(a)}{\Gamma\_t} \cdot c\_t(a)$$

To get per-step expected regret, we subtract the cumulative cost of a<sup>∗</sup> and divide by the number of time steps T.

**Theorem 1 (MW Has Bounded Regret).** *The algorithm of Fig. 5 has expected per-step external regret at most* <sup>η</sup> <sup>+</sup> ln <sup>|</sup>A<sup>|</sup> / ηT*.*

*Proof Sketch.* The proof of Theorem 1 uses a potential-function argument, with potential Φ<sup>t</sup> equal the sum of the weights Γ<sup>t</sup> = - <sup>a</sup>∈<sup>A</sup> <sup>w</sup>t(a) at time <sup>t</sup>. It proceeds by relating the cumulative expected cost - <sup>t</sup> <sup>ζ</sup><sup>t</sup> of the algorithm to *OPT*, the cost of the best fixed action, through the intermediate quantity Γ<sup>T</sup> +1.

The proof additionally relies on the following two facts derived from the Taylor expansion ln(1 <sup>−</sup> <sup>x</sup>) = <sup>−</sup><sup>x</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>x</sup><sup>3</sup> <sup>3</sup> −··· :

$$\begin{aligned} \ln(1-x) &\le -x, & x &< 1\\ -x - x^2 &\le \ln(1-x), & x &\le 1/2 \end{aligned}$$

By letting <sup>η</sup> <sup>=</sup> ln <sup>|</sup>A<sup>|</sup> / T (cf. [36, Chapter 17]), it's possible to restate the regret bound of Theorem 1 to the following arguably nicer bound:

### **Corollary 1 (MW Is No Regret)**

$$\left(\sum\_{t=1}^{T} \zeta\_t - OPT\right) / \; T \le 2\sqrt{\ln \left|A\right| / \; T}$$

 Here, the number of iterations T must be large enough to ensure that η = ln <sup>|</sup>A<sup>|</sup> / T <sup>≤</sup> <sup>1</sup>/2, thus ensuring that <sup>η</sup> <sup>∈</sup> (0, <sup>1</sup>/2].

#### **5.3 MW Architecture**

Our implementation and proof of MW (Fig. 6) were designed to be extensible. At a high level, the proof structure follows the program refinement methodology, in which a high-level mathematical but inefficient specification of MW (High-Level Functional Specification) is gradually made more efficient by a series of refinements to various features of the program (for example, by

 

**Fig. 6.** MW architecture

replacing an inefficient implementation of a key-value map with a more efficient balanced binary tree).

For each such refinement, we prove that every behavior of the lower-level program is a possible behavior of the higher-level program it refines. Thus specifications proved for all behaviors of the high-level program also apply to each behavior at the low level. By behavior here, we mean the trace of action distributions output by MW as it interacts with, and receives cost vectors from, the environment.

We factor the lower implementation layers (Medium and Low) into an interpreter and operational semantics over a domain-specific language specialized to MW-style algorithms (MW DSL). The DSL defines commands for maintaining and updating weights tables as well as commands for interacting with the environment. We prove, for any DSL program c, that the interpretation of that program refines its behavior with respect to the small-step operational semantics (Medium). Our overall proof specializes this general refinement to an implementation of MW as a command in the DSL, in order to relate that command's interpreted behavior to the high-level functional specification.

### **5.4 MW DSL**

The syntax and semantics of the MW DSL are given in Fig. 7. The small-step operational semantics ( c, σ <sup>⇒</sup> <sup>c</sup> , σ ) is parameterized by an environment oracle that defines functions for sending action distributions to the environment (oracle send) and for receiving the resulting cost vectors (oracle recv). The oracle will in general be implemented by other clients also running MW (Sect. 6) but is left abstract here to facilitate abstraction and reuse. The oracle is stateful (the type T, of oracle states, may be updated both by oracle send and oracle recv).

Most of the operational semantics rules are straightforward. In the MW-Step-Weights rule for updating the state's weights table, we make use of an auxiliary expression evaluation function <sup>E</sup>−[−] (standard and therefore not shown in Fig. 7). The only other interesting rules are those for send and recv, which call oracle send and oracle recv respectively. In the relation oracle recv, the first two arguments are treated as inputs (the input oracle state of type T and the channel) while the second two are treated as outputs (the cost vector of type <sup>A</sup> <sup>→</sup> <sup>Q</sup> and the output oracle state). In the relation oracle send, the first three arguments are inputs while only the last (the output oracle state) is an output.

*Multiplicative Weights.* As an example of an MW DSL program, consider our implementation (Listing 1.1) of the high-level MW of Fig. 5. To the right of each program line, we give comments describing the effect of each command. The program is itself divided into three functions:mult weights init, which initializes the weights table to assign weight 1 to each action a in the action space A; mult weights body, which defines the body of the main loop of MW; and mult weights, which simply composes mult weights init with mult weights body.

**Listing 1.1.** MW DSL Implementation of Multiplicative Weights

**Definition** mult weights init (A : Type) - update (λ a : A ⇒ 1); (∗ For all a ∈ A, initialize w1(a)=1. ∗) send. (∗ Commit to the uniform distribution over actions. ∗)

**Definition** mult weights body (A : Type) - recv; (∗ Block until agent receives cost vector c*<sup>t</sup>* from environment. ∗) update (λ a : A ⇒ weight a ∗ (1 − η ∗ cost a)); (∗ Update weights. ∗) send. (∗ Commit to distribution w*t*/Γ*t*. ∗)

**Definition** mult weights (A : Type) (n : N.t) - mult weights init A; (∗ Initialize weights and commit to initial mixed strategy. ∗) iter n (mult weights body A). (∗ Do n iterations of the MW main loop. ∗)

The MW DSL contains commands and expressions that are specialized to MW-style applications. Consider the function mult weights body (line 5). It first receives a cost vector from the environment using the specialized recv command. At the level of the MW DSL, recv is somewhat abstract. The program does not specify, e.g., which network socket to use. Implementation details such as these are resolved by the MW interpreter, which we discuss below in Sect. 5.5.

After recv, mult weights body implements an update to its weights table as defined by the command: update (λa : <sup>A</sup> <sup>⇒</sup> weight <sup>a</sup> <sup>∗</sup> (1 <sup>−</sup> <sup>η</sup> <sup>∗</sup> cost <sup>a</sup>)). As an argument to the update, we embed a function from actions <sup>a</sup> <sup>∈</sup> <sup>A</sup> to expressions that defines how the weight of each action a should change at this step (time t+ 1). The expressions weight a and cost a refer to the weight and cost, respectively, of action <sup>a</sup> at time <sup>t</sup>. The anonymous function term is defined in Ssreflect-Coq, the metalanguage in which the MW DSL is defined.

### **5.5 Interpreter**

To run MW DSL programs, we wrote an executable interpreter in Coq with type:

```
interp (c : com A) (s : cstate) : option cstate.
```
The type cstate defines the state of the interpreter after each step, and in general maps quite closely to the type of states σ used in the MW DSL operational semantics. It is given by the record:

Syntax

```
Binary operators ⊕ ::= + | − |∗
     Expressions e ::= d | −e | weight a | cost a | η | e1 ⊕ e2
     Commands c ::= skip | update (λa : A ⇒ e) | c1; c2 | iter n c | recv | send
```
Environment Oracle

oracle recv : <sup>T</sup> <sup>→</sup> oracle chanty <sup>→</sup> (<sup>A</sup> <sup>→</sup> <sup>Q</sup>) <sup>→</sup> <sup>T</sup> <sup>→</sup> Prop oracle send : <sup>T</sup> <sup>→</sup> dist <sup>A</sup> <sup>→</sup> oracle chanty <sup>→</sup> <sup>T</sup> <sup>→</sup> Prop

States σ - { SCosts : <sup>A</sup> <sup>→</sup> <sup>Q</sup>; SCostsOk : <sup>∀</sup>a. <sup>|</sup>SCosts <sup>a</sup>| ≤ <sup>1</sup> ; SPrevCosts : seq {<sup>c</sup> : <sup>A</sup> <sup>→</sup> <sup>Q</sup> | ∀a. <sup>|</sup>c a| ≤ <sup>1</sup>} ; SWeights : <sup>A</sup> <sup>→</sup> <sup>Q</sup> ; SWeightsOk : <sup>∀</sup>a. <sup>0</sup> <sup>&</sup>lt; SWeights <sup>a</sup> ; SEta : <sup>Q</sup>; SEtaOk : <sup>0</sup> <sup>&</sup>lt; SEta <sup>≤</sup> 1/2 ; SOutputs : seq (dist A) ; SChan : oracle chanty ; SOracleSt : <sup>T</sup> }. Current cost vector Previous cost vectors Weights table The η parameter Committed distributions I/O channel Environment/oracle state

Operational Semantics


**Fig. 7.** MW DSL syntax and operational semantics, parameterized by an environment oracle defining the type *T* of environment states and the functions oracle recv and oracle send for interacting with the environment. The type *A* is that of states in the underlying game.

**Record** cstate : Type - { SCosts : M.t <sup>Q</sup> ; SPrevCosts : list (M.t Q) ; SWeights : M.t Q ; SEta : Q ; SOutputs : list (<sup>A</sup> <sup>→</sup> <sup>Q</sup>) ; SChan : oracle chanty ; SOracleSt : <sup>T</sup> }. Current cost vector Previous cost vectors Weights table The η parameter Committed distributions I/O channel Environment/oracle state At the level of cstates, we use efficient purely functional data structures such as AVL trees. For example, the type M.t Q denotes an AVL-tree map from actions A to rational numbers Q. In the small-step semantics state, by contrast, we model the weights table not as a balanced binary tree but as a Ssreflect-Coq finite function, of type {ffun <sup>A</sup> <sup>→</sup> <sup>Q</sup>}, which directly maps actions of type <sup>A</sup> to values of type Q.

To speed up computation on rationals, we use a dyadic representation q = <sup>n</sup> 2*d* , which facilitates fast multiplication. We do exact arithmetic on dyadic Q instead of floating point arithmetic to avoid floating-point precision error. Verification of floating-point error bounds is an interesting but orthogonal problem (cf. [31,34]).

The field SOutputs in the cstate record, a list of functions mapping actions <sup>a</sup> <sup>∈</sup> <sup>A</sup> to their probabilities, stores the history of weights distributions generated by the interpreter as send commands are executed. To implement commands such as send and recv, we parameterize our MW interpreter by an environment oracle, just as we did the operational semantics. The operations implemented by the interpreter environment oracle are functional versions of the operational semantics oracle send and oracle recv:

$$\mathsf{coracle\\_send}' : \forall A \textbf{:Type}, \ T \to A \to \mathsf{oracle\\_chanty} \* T \qquad \text{oracle\\_rec\'} \colon \forall A \textbf{:Type}, \ T \to \mathsf{oracle\\_chanty} \to \mathsf{list\\_\{A\*\mathbb{Q}\}} \* T$$

The oracle state type T is provided by the implementation of the oracle, as in the operational semantics. The command oracle send takes a state of type T and a value of type A as arguments and returns a pair of a channel of type oracle chanty (on which to listen for a response from the environment) and a new oracle state of type T. The command oracle recv takes as arguments the oracle state and channel and returns a list of (a, q) pairs, representing a cost vector over actions, along with the new oracle state.

### **5.6 Proof**

The top-level theorem proved of our high-level functional specification of MW is:

```
Theorem perstep weights noregret :
(expCostsR − OPTR)/T ≤ η + ln size A/(η ∗T).
```
The expression expCostsR is the cumulative expected cost of MW on a sequence of cost vectors, or the sum, for each time t, of the expected cost of the MW algorithm at time t. OPTR is the cumulative cost over T rounds of the best fixed action. The number η (a dyadic rational required to lie in range (0, 1/2]) is the learning parameter provided to MW and ln size A is the natural log of the size of the action space A. T is the number of time steps. In contrast to the interpreter and semantics of Sect. 5.3 (where we do exact arithmetic on dyadics), for reasoning and specification at the level of the proof we use Coq's real number library and real-valued functions such as square root and log.

By choosing η to equal ln size A / T, Corollary 1 showed that it's possible to restate the right-hand side of the inequality in perstep weights noregret to 2 ∗ sqrt (ln size A / T), thus giving an arguably nicer bound. Since in our implementation of MW we require that η be a dyadic rational, we cannot implement η = ln size A / T directly (ln size A is irrational). We do, however, prove the following tight approximation for all values of η approaching ln size A / T:

**Lemma** perstep weights noregret' : <sup>∀</sup><sup>r</sup> : <sup>R</sup>. <sup>r</sup> <sup>=</sup> <sup>−</sup><sup>1</sup> <sup>→</sup> <sup>η</sup> = (1+r)∗(sqrt (ln size A / T)) <sup>→</sup> (expCostsR − OPTR)/T ≤ (1+r)∗(sqrt (ln size A / T)) + (sqrt (ln size A / T))/(1+r).

In the statement of this lemma, the r term quantifies the error (how far <sup>η</sup> is from its optimal value sqrt (ln size A / T). We require that <sup>r</sup> <sup>=</sup> <sup>−</sup>1 to ensure that division by 1 + r is well-defined. The resulting bound approaches <sup>2</sup> <sup>∗</sup> sqrt (ln size A / T) as <sup>r</sup> approaches 0.

*High-Level Functional Specification.* Our high-level functional specification of MW closely models the mathematical specification of MW given in Fig. 5. For example, the following four definitions:

**Definition** weights : Type - {ffun <sup>A</sup> <sup>→</sup> <sup>Q</sup>}. **Definition** costs : Type - {ffun <sup>A</sup> <sup>→</sup> <sup>Q</sup>}. **Definition** init weights : weights <sup>λ</sup>( : <sup>A</sup>) <sup>⇒</sup> 1. **Definition** update weights (w:weights) (c:costs) : weights - λa : <sup>A</sup> <sup>⇒</sup> <sup>w</sup> <sup>a</sup> <sup>∗</sup> (1 <sup>−</sup> <sup>η</sup> <sup>∗</sup> c a).

construct the types of weight (weights) and cost vectors (costs), represented as finite functions from A to Q; define the initial weight vector (init weights), which maps all actions to cost 1; and define the MW weight update rule (update weights). The recursive function:

```
Fixpoint weights of (cs : seq costs) (w : weights) : weights -

  if cs is c :: cs then update weights (weights of cs w) c else w.
```
defines the vector that results from using update weights to repeatedly update w with respect to cost vectors cs.

*Adaptive Vs. Oblivious Adversaries.* In our high-level specification of MW, we parameterize functions like weights of by a fixed sequence of cost vectors cs rather than model interaction with the environment, as is done in Fig. 5. An execution of our low-level interpreted MW, even against an adaptive adversary, is always simulatable by the high-level functional specification by recording in the low-level execution the cost vectors produced by the adversary, as is done by the SPrevCosts field (Sect. 5.5), and then passing this sequence to weights of. This strategy is quite similar to using backward induction to solve the MW game for an oblivious adversary.

*Connecting the Dots.* To connect the MW interpreter to the high-level specification, we prove a series of refinement theorems (technically, backward simulations). As example, consider:

```
Lemma interp step plus :
  ∀(a0 : A) (s : state A) (t t : cstate) (c : com A),
  interp c t = Some t
                      →
  match states s t →
  ∃c s
       , final com c ∧
    ((c =CSkip ∧ s = s
                        ) ∨ step plus a0 csc s
                                                 ) ∧
    match states s t

                     .
```
which relates the behavior of the interpreter (interp c t) when run on an arbitrary command c in cstate t to our model of MW DSL commands as specified by the operational semantics.

To prove that the operational semantics correctly refines our high-level functional specification of MW (and therefore satisfies the regret bounds given at the start of Sect. 5.6), we prove a similar series of refinements. Since backward simulations compose transitively, we prove regret bounds on our interpreted MW just by composing the refinements in series. The bounds we prove in this way are parametric in the environment oracle with which MW is instantiated. When the oracle state types differ from source to target in a particular simulation, as is the case in our proof that the MW DSL interpreter refines the operational semantics, we require that the oracles simulate as well.

### **6 Coordinated MW**

A system of multiple agents each running MW yields an --CCE of the underlying game. If the game being played is smooth – for example, it was built using the combinators of the Smooth Games DSL of Sect. 4 – then the resulting --CCE has bounded social cost with respect to a globally optimal strategy. In this section, we put these results together by (1) defining an operational semantics of distributed interaction among multiple clients each running MW, and (2) proving that distributed executions of this semantics yield near-optimal solutions, as long as the underlying game being played is smooth.

### **6.1 Machine Semantics**

We model the evolution of the distributed machine by the operational semantics in Fig. 8. Client states (client state) bundle commands from the MW DSL (Sect. 5) with MW states parameterized by the ClientPkg oracle. The client oracle send and receive functions model single-element (pin) queues, represented as values of type option (dist A), storing values sent by an MW node, and of type option (<sup>A</sup> <sup>→</sup> <sup>Q</sup>), storing values received by an MW node.

States of the coordinated machine (type machine state N A) map client indices in range [0..N <sup>−</sup> 1] to client states (type client state <sup>A</sup>). Machine states also record, at each iteration of the distributed MW protocol, the history of distributions received from the clients in that round (type seq ([0..N−1] <sup>→</sup> dist <sup>A</sup>)), which will be used to prove Price of Anarchy bounds in the next section (Sect. 6.2). We say that all clients have sent in a particular machine state m, Client Oracle ClientPkg - { sent : option (dist <sup>A</sup>); received : option (<sup>A</sup> <sup>→</sup> <sup>Q</sup>); received ok : <sup>∀</sup>v. received = Some <sup>v</sup> → ∀a. 0 <sup>≤</sup> <sup>v</sup><sup>a</sup> <sup>≤</sup> <sup>1</sup> } client oracle recv <sup>A</sup> (<sup>p</sup> : ClientPkg) (<sup>−</sup> : unit) (<sup>v</sup> : <sup>A</sup> <sup>→</sup> <sup>Q</sup>) (p : ClientPkg) - <sup>p</sup>.received = Some <sup>v</sup> <sup>∧</sup> <sup>p</sup> .received = None <sup>∧</sup> <sup>p</sup> .sent = p.sent client oracle send <sup>A</sup> (<sup>p</sup> : ClientPkg) (<sup>d</sup> : dist <sup>A</sup>) (<sup>−</sup> : unit) (p : ClientPkg) - <sup>p</sup>.sent = None <sup>∧</sup> <sup>p</sup> .sent = Some <sup>d</sup> <sup>∧</sup> <sup>p</sup> .received = p.received

Machine States

client state <sup>A</sup> <sup>σ</sup> - (com <sup>A</sup> <sup>∗</sup> state <sup>A</sup> ClientPkg unit) machine state N A  <sup>m</sup> - { clients : [0..<sup>N</sup> <sup>−</sup> <sup>1</sup>] <sup>→</sup> client state <sup>A</sup>; hist : seq ([0..N <sup>−</sup> 1] <sup>→</sup> dist <sup>A</sup>) } all clients have sent <sup>A</sup> (<sup>m</sup> : machine state) (<sup>f</sup> : [0..N <sup>−</sup> 1] <sup>→</sup> dist <sup>A</sup>) - <sup>∀</sup><sup>i</sup> : [0..N <sup>−</sup> 1]. **let** (−, σ) m.clients i **in** (SOracleSt <sup>σ</sup>).received = None <sup>∧</sup> (SOracleSt <sup>σ</sup>).sent = Some <sup>f</sup>i.

$$\begin{array}{l} \text{Machine} \ \mathsf{Step} \left[ \vdash m \Longrightarrow m' \right] \\ \hline \\ \mathsf{cost}. \mathsf{vec} \ A \ \mathsf{in} \ \mathsf{ } A \ \mathsf{ } \mathsf{ } \mathsf{ } A \ \mathsf{ } \ \mathsf{ } \mathsf{Q} \ \triangleq \lambda a. \ \sum\_{(p \mid \langle \mathsf{0}, N-1 \rangle \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{in} \ \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ } \mathsf{ }$$

**Fig. 8.** Semantics of the distributed machine

committing to the set of distributions f, if each client's received buffer is empty and its sent buffer contains the distribution fi, of type dist A.

The machine step relation models a server–client protocol, distinguishing server steps (ServerStep) from client steps (ClientStep). Client steps, which run commands in the language of Fig. 7, may interleave arbitrarily. Server steps are synchronized by the all clients have sent relation to run only after all clients have completed the current round. The work done by the server is modeled by the auxiliary relation server sent cost vector ifmm , which constructs and sends to client i the cost vector derived from the set of client distributions f. The relation <sup>σ</sup> <sup>∼</sup><sup>O</sup> <sup>σ</sup> states that <sup>σ</sup> and <sup>σ</sup> are equal up to their SOracleSt components.

In the distributed MW setting, the cost to player i of a particular action a : A is defined as the expected value, over all N-player strategy vectors p in which player i chose action a (p<sup>i</sup> = a), of the cost to player i of p, with the expectation over the (<sup>N</sup> <sup>−</sup> 1)-size product distribution induced by the players <sup>j</sup> <sup>=</sup> <sup>i</sup>.

### **6.2 Convergence and Optimality**

Our proof that MW is no regret (Sect. 5) extends to system-wide convergence and optimality guarantees, with respect to the distributed execution model of Fig. 8 in which each client runs our MW implementation. The proof has three major steps:


Composing 1, 2, and 3 proves that the distributed machine of Fig. 8 – when instantiated to clients running MW – converges to near-optimal solutions to smooth games. We briefly describe each part in turn.

*Part 1* : No-regret clients are still no regret when interleaved. That MW no-regret bounds lift to an MW client running in the context of the distributed operational semantics of Fig. 8 follows from the oracular structure of our implementation of MW (Sect. 5) – clients interact with other clients and with the server only through the oracle.

In particular, for any execution <sup>m</sup> <sup>=</sup>⇒<sup>+</sup> <sup>m</sup> of the machine of Fig. 8, and for any client i, there is a corresponding execution of client i with respect to a small nondeterministic oracle that simply "guesses" which cost vector to supply every time the MW client executes a recv operation. Because MW is no regret for all possible sequences of cost vectors, proving a refinement against the nondeterministic oracle implies a regret bound on client i's execution from state m<sup>i</sup> to state m i.

We lift this argument to all the clients running in the Fig. 8 semantics by proving the following theorem:

**Theorem** all clients bounded regret Amm T (- : rat) : hist <sup>m</sup> = nil <sup>→</sup> <sup>0</sup> <sup>&</sup>lt; size (hist <sup>m</sup> ) <sup>→</sup> final state <sup>m</sup> <sup>→</sup> <sup>m</sup> <sup>=</sup>⇒<sup>+</sup> <sup>m</sup> <sup>→</sup> (∀i, <sup>m</sup>.clients <sup>i</sup> = (mult weights <sup>A</sup> T, init state A η tt (init ClientPkg <sup>A</sup>))) <sup>→</sup> <sup>η</sup> + ln size A/(<sup>η</sup> <sup>∗</sup>T) <sup>≤</sup> - → machine regret eps m -.

The predicate machine regret eps holds in state s , against regret bound -, if all clients have expected regret in state s at most - (with respect to the σ<sup>T</sup> distribution we describe below), for any rational larger than <sup>η</sup> <sup>+</sup> ln size A/(<sup>η</sup> <sup>∗</sup>T) (the regret bound we proved of MW in Sect. 5).

We assume that the history is empty in the initial state (hist m = nil), and that at least one round was completed (0 < size (hist m )). By final state m , we mean that all clients have synchronized with the server (by receiving a cost vector and sending a distribution) and then have terminated in CSkip. All clients in state m are initialized to execute T steps of MW over game A (mult weights A T), from an initial state and initial ClientPkg.

*Part 2: System-wide convergence to an* -*-CCE.* The machine semantics of Fig. 8 converges to an approximate Coarse Correlated Equilibrium (--CCE).

More formally, consider an execution <sup>m</sup> <sup>=</sup>⇒<sup>+</sup> <sup>m</sup> of the Fig. <sup>8</sup> semantics that results in a state m for which machine regret eps m - (all clients have regret at most -, as established in Part I). The distribution σ<sup>T</sup> , defined as the timeaveraged history of the product of the distributions output by the MW clients at each round, is an --CCE:

$$
\sigma\_T \stackrel{\Delta}{=} \lambda p. \; \frac{\sum\_{i=1}^T \prod\_{j=1}^N (\text{hist } m')\_i^j \ p\_j}{T}
$$

By (hist m ) j <sup>i</sup> we mean the distribution associated to player <sup>j</sup> at time <sup>i</sup>, as recorded in the execution history stored in state m . The value ((hist m ) j <sup>i</sup> <sup>p</sup><sup>j</sup> ) is the probability that client j chose action p<sup>j</sup> in round i.

We formalize this property in the following Coq theorem:

```
Theorem machine regret eCCE m -
                                   :
  machine regret eps m -
                         →
  eCCE -
          σT .
```
which states that σ<sup>T</sup> is an eCCE, with approximation factor -, as long as each client's expected regret over σ<sup>T</sup> is at most - (machine regret eps m -) – exactly the property we proved in Part 1 above.

*Part 3 System-wide regret bounds.* The machine semantics of Fig. 8 converge to a state with expected cost bounded with respect to the optimal cost.

Consider an execution of the Fig. <sup>8</sup> semantics <sup>m</sup> <sup>=</sup>⇒<sup>+</sup> <sup>m</sup> and an satisfying the conditions of all clients bounded regret. If the underlying game is smooth, the expected cost of the time-averaged distribution of the clients in m , σ<sup>T</sup> , is bounded with respect to the cost of an optimal strategy profile s by the following Coq theorem:

**Theorem** systemwide POA bound Amm T (- : rat) s : hist <sup>m</sup> = nil <sup>→</sup> <sup>m</sup> <sup>=</sup>⇒<sup>+</sup> <sup>m</sup> <sup>→</sup> <sup>0</sup> <sup>&</sup>lt; size (hist <sup>m</sup> ) <sup>→</sup> final state <sup>m</sup> <sup>→</sup> (∀i, <sup>m</sup>.clients <sup>i</sup> = (mult weights <sup>A</sup> T, init state A η tt (init ClientPkg <sup>A</sup>))) <sup>→</sup> <sup>η</sup> + ln size A/(η∗T) <sup>≤</sup> - → optimal <sup>s</sup> <sup>→</sup> ExpectedCost <sup>σ</sup><sup>T</sup> <sup>≤</sup> <sup>λ</sup>/(1−μ) <sup>∗</sup> Cost <sup>s</sup> + (N∗-/(1−μ))

In the above theorem, λ and μ are the smoothness parameters of the game A while N is the number of players. Cost s is the social (total) cost of the optimal state s .

### **7 Related Work**

*Reinforcement Learning, Bandits.* There is extensive work on reinforcement learning [39], multi-agent reinforcement learning (MARL [19]), and multi-armed bandits (MAB, [15]), more than can be cited here. We note, however, that Qlearning [41], while similar in spirit to MW, addresses the more general scenario in which an agent's action space is modeled by an arbitrary Markov Decision Process (in MW, the action space is a single set A). Our verified MW implementation is most suitable, therefore, for use in the full-information analog of MAB problems, in which actions are associated with "arms" and each agent learns the cost of all arms – not just the one it pulled – at each time step. In this domain, MW has good convergence bounds, as we prove formally of our implementation in this paper. Relaxing our verified MW and formal proofs to the partial information Bandit setting is interesting future work.

*Verified Distributed Systems.* EventML [33] is a domain-specific language for specifying distributed algorithms in the Logic of Events, which can be mechanically verified within the Nuprl proof assistant. Work has been done to develop methods for formally verifying distributed systems in Isabelle [20]. Model checking has been used extensively (e.g., [21,24]) to test distributed systems for bugs.

Verdi [42] is a Coq framework for implementing verified distributed systems. A Verdi system is implemented as a collection of handler functions which exchange messages through the network or communicate with the "outside world" via input and output. Application-level safety properties of the system can be proved with respect to a simple, idealized network semantics. A verified system transformer (VST) can then be used to transform the executable system into one which is robust to network faults such as reordering, duplication, and dropping of packets. The safety properties of the system proved under the original network semantics are preserved under the new faulty semantics, with minimal additional proof effort required of the programmer.

The goals of Verdi are complementary to our own. We implement a verified no-regret MW algorithm, together with a language of Roughgarden smooth games, for constructing distributed systems with verified convergence and correctness guarantees. Verdi allows safety properties of a distributed system to be lifted to analogous systems which tolerate various network faults, and provides a robust runtime system for execution in a practical setting. It stands to reason, then, that Verdi (as well as follow-on related work such as [37]) may provide a natural avenue for building robust executable versions of our distributed applications. We leave this for future work.

Chapar [23] is a Coq framework for verifying causal consistency of distributed key-value stores as well as correctness of client programs with respect to causally consistent key-value stores. The implementation of a key-value store is proved correct with respect to a high-level specification using a program refinement method similar to ours. Although Chapar's goal isn't to verify robustness to network faults, node crashes and message losses are modeled by its abstract operational semantics.

IronFleet [18] is a framework and methodology for building verified distributed systems using a mix of TLA-style state machine refinement, Hoare logic, and automated theorem proving. An IronFleet system is comprised of three layers: a high-level state machine specification of the overall system, a more detailed distributed protocol layer which describes the behavior of each agent in the system as a state machine, and the implementation layer in which each agent is programmed using a variant of the Dafny [22] language extended with a trusted set of UDP networking operations. Correctness properties are proved with respect to the high-level specifications, and a series of refinements is used to prove that every behavior in the implementation layer is a refinement of some behavior in the high-level specification. IronFleet has been used to prove safety and liveness properties of IronRSL, a Paxos-based replicated state machine, as well as IronKV, a shared key-value store.

*Alternative Proofs.* Variant proofs of Theorem 1, such as the one via KLdivergence (cf. [1, Section 2.2]), could be formalized in our framework without modifying most parts of the MW implementation. In particular, because we have proved once and for all that our interpreted MW refines a high-level specification of MW, it would be sufficient to formalize the new proof just with respect to the high-level program of Sect. 5.6.

### **8 Conclusion**

This paper reports on the first formally verified implementation of Multiplicative Weights (MW), a simple yet powerful algorithm for approximately solving Coarse Correlated Equilibria, among many other applications. We prove our MW implementation correct via a series of program refinements with respect to a high-level implementation of the algorithm. We present a DSL for building smooth games and show how to compose MW with smoothness to build distributed systems with verified Price of Anarchy bounds. Our implementation and proof are open source and available online.

**Acknowledgments.** This material is based on work supported by the National Science Foundation under Grant No. CCF-1657358. We thank the ESOP anonymous referees for their comments on an earlier version of this paper.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Program Verification by Coinduction**

Brandon Moore<sup>1</sup>, Lucas Pe˜na2(B), and Grigore Rosu1,2

<sup>1</sup> Runtime Verification, Inc., Urbana, IL, USA <sup>2</sup> University of Illinois at Urbana-Champaign, Urbana, IL, USA lpena7@illinois.edu

**Abstract.** We present a novel program verification approach based on coinduction, which takes as input an operational semantics. No intermediates like program logics or verification condition generators are needed. Specifications can be written using any state predicates. We implement our approach in Coq, giving a certifying language-independent verification framework. Our proof system is implemented as a single module imported unchanged into language-specific proofs. Automation is reached by instantiating a generic heuristic with language-specific tactics. Manual assistance is also smoothly allowed at points the automation cannot handle. We demonstrate the power and versatility of our approach by verifying algorithms as complicated as Schorr-Waite graph marking and instantiating our framework for object languages in several styles of semantics. Finally, we show that our coinductive approach subsumes reachability logic, a recent language-independent sound and (relatively) complete logic for program verification that has been instantiated with operational semantics of languages as complex as C, Java and JavaScript.

### **1 Introduction**

Formal verification is a powerful technique for ensuring program correctness, but it requires a suitable verification framework for the target language. Standard approaches such as Hoare logic [1] (or verification condition generators) require significant effort to adapt and prove sound and relatively complete for a given language, with few or no theorems or tools that can be reused between languages. To use a software engineering metaphor, Hoare logic is a design pattern rather than a library. This becomes literal when we formalize it in a proof assistant.

We present instead a single language-independent program verification framework, to be used with an executable semantics of the target programming language given as input. The core of our approach is a simple theorem which gives a coinduction principle for proving partial correctness.

To trust a non-executable semantics of a desired language, an equivalence to an executable semantics is typically proved. Executable semantics of programming languages abound in the literature. Recently, executable semantics of several real languages have been proposed, e.g. of C [2], Java [3], JavaScript [4,5], Python [6], PHP [7], CAML [8], thanks to the development of executable semantics engineering frameworks like K [9], PLT-Redex [10], Ott [11], etc., which c The Author(s) 2018

make defining a formal semantics for a programming language almost as easy as implementing an interpreter, if not easier. Our coinductive program verification approach can be used with any of these executable semantics or frameworks, and is correct-by-construction: no additional "axiomatic semantics", "program logic", or "semantics suitable for verification" with soundness proofs needed.

As detailed in Sect. 6, we are not the first to propose a language-independent verification infrastructure that takes an operational semantics as input, nor the first to propose coinduction for proving isolated properties about some programs. However, we believe that coinduction can offer a fresh, promising and general approach as a language-independent verification infrastructure, with a high potential for automation that has not been fully explored yet. In this paper we make two steps in this direction, by addressing the following research questions:


To address RQ1, we make use of a key mathematical result, Theorem 1, which has been introduced in more general forms in the literature, e.g., in [12,13] and in [14]. We mechanized it in Coq in a way that allows us to instantiate it with a transition relation corresponding to any target language semantics, hereby producing certifying program verification for that language. Using the resulting coinduction principle to show that a program meets a specification produces a proof which depends only on the operational semantics. We demonstrate our proofs can be effectively automated, on examples including heap data structures and recursive functions, and describe the implemented proof strategy and how it can be reused across languages defined using a variety of operational styles.

To address RQ2, we show that our coinductive approach not only subsumes reachability logic [15], whose practicality has been demonstrated with languages like C, Java, and JavaScript, but also offers several specific advantages. Reachability logic consists of a sound and (relatively) complete proof system that takes a given language operational semantics as a *theory* and derives reachability properties about programs in that language. A mechanical procedure can translate any proof using reachability logic into a proof using our coinductive approach.

We first introduce our approach with a simple intuitive example, then prove its correctness. We then discuss mechanical verification experiments across different languages, show how reachability logic proofs can be translated into coinductive proofs, and conclude with related and future work. Our entire Coq formalization, proofs and experiments are available at [16].

### **2 Overview and Basic Notions**

Section 4 will show the strengths of our approach by means of verifying rather complex programs. Here our objective is different, namely to illustrate it by verifying a trivial IMP (C-style) program: s=0; while (--n) {s=s+n;}. Let sum stand for the program and loop for its while loop. When run with a positive initial value <sup>n</sup> of n, it sets s to the sum of 1,...,n−1. To illustrate non-termination, we assume unbounded integers, so loop runs forever for non-positive <sup>n</sup>. An IMP language syntax sufficient for this example and a possible execution trace are given in Fig. 1. The exact step granularity is not critical for our approach, as long as diverging executions produce infinite traces.


**Fig. 1.** Syntax of **IMP** (left) and sample execution of sum (right)

While our coinductive program verification approach is self-contained and thus can be presented without reliance on other verification approaches, we prefer to start by discussing the traditional Hoare logic approach, for two reasons. First, it will put our coinductive approach in context, showing also how it avoids some of the limitations of Hoare logic. Second, we highlight some of the subtleties of Hoare logic when related to operational semantics, which will help understand the reasons and motivations underlying our definitions and notations.

### **2.1 Intuitive Hoare Logic Proof**

A Hoare logic specification/triple has the form {|ϕ*pre* |} code {|ϕ*post*|}. The convenience of this notation depends on specializing to a particular target language, such as allowing variable names to be used directly in predicates to stand for their values, or writing only the current statement. This hides details of the environment/state representation, and some framing conventions or compositionality assumptions over the unmentioned parts. A Hoare triple specifies a set of (partial correctness) reachability claims about a program's behavior, and it is (IMP statement rules)

$$\frac{\cdot}{\{\|\varphi[\mathbf{e}/\mathbf{x}]\|\:\mathbf{x}=\mathbf{e};\;\|\varphi\|\}}\tag{\text{HL-ASGN}}$$

$$\frac{\left\{\varphi\_1 \parallel \mathbf{s}\_1 \parallel \varphi\_2 \parallel, \parallel \varphi\_2 \parallel \mathbf{s}\_2 \parallel \varphi\_3 \parallel \right\}}{\left\{\varphi\_1 \parallel \mathbf{s}\_1 \mathbf{s}\_2 \parallel \varphi\_3 \parallel \end{bmatrix}} \tag{\text{HL-SEQ}}$$

$$\frac{\{\varphi \land \mathsf{e} \neq 0\} \mathsf{s}\_1 \{\varphi'\}, \quad \{\varphi \land \mathsf{e} = 0\} \mathsf{s}\_2 \{\varphi'\}}{\{\varphi\} \text{ if (e) then } \{\mathsf{s}\_1\} \text{ else } \{\mathsf{s}\_2\} \text{ (}\varphi'\text{)}} \qquad \text{(HL-IF)}$$

$$\frac{\{\varphi \land \mathsf{e} \neq 0\} \mathsf{s} \; \|\varphi\|}{\|\varphi\| \; \mathsf{while} \; \mathsf{e} \; \|\mathsf{s}\| \; \{\mathsf{s}\} \; \|\varphi \land \mathsf{e} = 0\|\mathsf{s}\}}\qquad\qquad\qquad\text{(HL-\mathsf{w}\;\mathsf{H}\; \mathsf{L}\; \mathsf{e}\;\|\mathsf{s}\|\mathsf{s}\;\mathsf{L}\; \mathsf{e}\;\mathsf{L}\; \mathsf{e}\;\mathsf{L}\; \mathsf{e}\;\mathsf{L}\; \mathsf{e}\;\mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{e}\;\mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\; \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{L}\, \mathsf{$$

$$\frac{\{\text{Generic rule}\}}{\|\boldsymbol{\psi}\| \rightsquigarrow \|\boldsymbol{\varphi}\| \rightsquigarrow \|\boldsymbol{\varphi}'\|, \ \vdash \boldsymbol{\varphi}' \to \boldsymbol{\psi}'} \quad \text{ (HL-consileQ)}$$

#### **Fig. 2. IMP** program logic.

typically an over-approximation (i.e., it specifies more reachability claims than desired or feasible). Specifically, assume some formal language semantics of IMP defining an execution step relation <sup>R</sup> <sup>⊆</sup> <sup>C</sup> <sup>×</sup> <sup>C</sup> on a set <sup>C</sup> of configurations of the form code <sup>|</sup> <sup>σ</sup>, like those in Fig. 1. We write <sup>a</sup> <sup>→</sup><sup>R</sup> <sup>b</sup> for (a, b) <sup>∈</sup> <sup>R</sup>. Section 2.3 (Fig. 3) discusses several operational semantics approaches we experimented with (Sect. 4), that yield such step relations R. A (partial correctness) *reachability claim* (c, P), relating an initial state <sup>c</sup> <sup>∈</sup> <sup>C</sup> and a target set of states <sup>P</sup> <sup>⊆</sup> <sup>C</sup>, is *valid* (or *holds*) iff the initial state <sup>c</sup> can either reach a state in <sup>P</sup> or can take an infinite number of steps (with <sup>→</sup>R); we write <sup>c</sup> <sup>⇒</sup><sup>R</sup> <sup>P</sup> to indicate that claim (c, P) is valid, and <sup>a</sup> <sup>→</sup> <sup>b</sup> or <sup>c</sup> <sup>⇒</sup> <sup>P</sup> instead of <sup>a</sup> <sup>→</sup><sup>R</sup> <sup>b</sup> or <sup>c</sup> <sup>⇒</sup><sup>R</sup> <sup>P</sup>, resp., when <sup>R</sup> is understood. Then {|ϕ*pre* |}code{|ϕ*post*|} specifies the set of reachability claims

$$\{ \langle \langle \mathtt{code} \vert \sigma\_{pre} \rangle, \{ \langle \mathtt{skip} \vert \sigma\_{post} \rangle \mid \sigma\_{post} \models \varphi\_{post} \} \rangle \mid \sigma\_{pre} \models \varphi\_{pre} \} $$

and it is *valid* iff all of its reachability claims are valid. It is necessary for P in reachability claims (c, P) specified by Hoare triples to be a set of configurations (and thus an over-approximation): it is generally impossible for ϕ*post* to determine exactly the possible final configuration or configurations.

While one can prove Hoare triples valid directly using the step relation →<sup>R</sup> and induction, or coinduction like we propose in this paper, the traditional approach is to define a language-specific proof system for deriving Hoare triples from other triples, also known as *a* Hoare logic, or program logic, for the target programming language. Figure 2 shows such a program logic for IMP. Hoare logics are generally not executable, so testing cannot show whether they match the *intended* semantics of the language. Even for a simple language like IMP, if one mistakenly writes e = 1 instead of e = 0 in rule (HL-while), then one gets an incorrect program logic. When trusted verification is desired, the program logic needs to be proved sound w.r.t. a reference executable semantics of the language, i.e, that each derivable Hoare triple is valid. This is a highly non-trivial task for complex languages (C, Java, JavaScript), in addition to defining a Hoare logic itself. Our coinductive approach completely avoids this difficulty by requiring no additional semantics of the programming language for verification purposes.

The property to prove is that sum (or more specifically loop) exits only when n is 0, with s as the sum n−1 <sup>i</sup>=1 <sup>i</sup> (or <sup>n</sup>(n−1) <sup>2</sup> ). In more detail, any configuration whose statement begins with sum and whose store defines n as <sup>n</sup> can run indefinitely or reach a state where it has just left the loop with n <sup>→</sup> 0, s <sup>→</sup>n−1 <sup>i</sup>=1 <sup>i</sup>, and the store otherwise unchanged. As a Hoare logic triple, that specification is

$$\{\mathbf{n} = n\} \text{ \*\*s=0\*\*; \quad \text{while} \{\text{--n}\} \{\mathbf{s} \mathbf{s} \mathbf{s} \mathbf{n}\text{;}\} \text{ } \{\mathbf{s} = \sum\_{i=1}^{n-1} i \land \mathbf{n} = 0\} \text{ \*\*s=0\*\*}\}$$

As seen, this Hoare triple asserts the validity of the set of reachability claims

$$S \equiv \{ (c\_{n, \sigma}, P\_{n, \sigma}) \mid \forall n, \forall \sigma \text{ undefined in } \mathbf{n} \} \tag{1}$$

where

$$\begin{aligned} c\_{n,\sigma} & \equiv \langle \mathtt{s=0}; \ \mathtt{while(--n)} \{ \mathtt{s=s+n}; \} \mid \mathtt{n} & \longmapsto n, \ \sigma \rangle \\ P\_{n,\sigma} & \equiv \{ \langle \mathtt{skip} \,|\, \mathtt{n} & \longmapsto 0, \mathtt{s} & \longmapsto \sum\_{i=1}^{n-1} i, \sigma' \rangle \mid \forall \sigma' \text{ undefined in } \mathtt{n}, \mathtt{s} \} \end{aligned}$$

We added the σ and σ state frames above for the sake of complete details about what Hoare triples actually specify, and to illustrate why P in claims (c, P) needs to be a set. Since the addition/removal of σ and σ does not change the subsequent proofs, for the remainder of this section, for simplicity, we drop them.

Now let us assume, without proof, that the proof system in Fig. 2 is sound (for the executable step relation →<sup>R</sup> of IMP discussed above), and let us use it to derive a proof of the sum example. Note that the proof system in Fig. <sup>2</sup> assumes that expressions have no side effects and thus can be used unchanged in state formulae, which is customary in Hoare logics, so the program needs to be first translated out into an equivalent one without the problematic --n where expressions have no side effects. We could have had more Hoare logic rules instead of needing to translate the code segment, but this would quickly make our program logics significantly more complicated. Either way, with even a simple imperative programming language like we have here, it is necessary to either add Hoare logic rules to Fig. 2 or to modify our code segment. These inconveniences are taken for granted in Hoare logic based verifiers, and they require non-negligible additional effort if trusted verification is sought. For comparison, our coinductive verification approach proposed in this paper requires no transformation of the original program. After modifying the above problematic expression, our code segment gets translated to the (hopefully) equivalent code:

s=0; n=n-1; while (n) {s=s+n; n=n-1;}

Let loop' be the new loop and let <sup>ϕ</sup>*inv* , its invariant, be

$$\mathbf{s} = \frac{((n-1)-\mathbf{n})\,(n+\mathbf{n})}{2}$$

The program variable n stands for its current value, while the mathematical variable <sup>n</sup> stands for the initial (sometimes called "old") value of n. Next, using the assign and sequence Hoare logic rules in Fig. 2, as well as basic arithmetic via the (HL-conseq) rule, we derive

$$\left\{\mathbf{n} = n\right\|\;\;\mathbf{s} \mathbf{=0};\;\mathbf{n} \mathbf{=n-1};\;\;\left\{\varphi\_{inv}\right\}\tag{2}$$

Similarly, we can derive {|ϕ*inv* <sup>∧</sup> <sup>n</sup> = 0|} s=s+n; n=n-1; {|ϕ*inv* |}. Then, applying the while rule, we derive {|ϕ*inv* |} loop' {|ϕ*inv* <sup>∧</sup> <sup>n</sup> = 0|}. The rest follows by the sequence rule with the above, (2), and basic arithmetic.

This example is not complicated, in fact it is very intuitive. However, it abstracts out a lot of details in order to make it easy for a human to understand. It is easy to see the potential difficulties that can arise in larger examples from needing to factor out the side effect, and from mixing both program variables and mathematical variables in Hoare logic specifications and proofs. With our coinduction verification framework, all of these issues are mitigated.

### **2.2 Intuitive Coinduction Proof**

Since our coinductive approach is language-independent, we do not commit to any particular, language-specific formalism for specifying reachability claims, such as Hoare triples. Consequently, we will work directly with raw reachability claims/specifications <sup>S</sup> <sup>⊆</sup> <sup>C</sup> × P(C) consisting of sets of pairs (c, P) with <sup>c</sup> <sup>∈</sup> <sup>C</sup> and <sup>P</sup> <sup>⊆</sup> <sup>C</sup> as seen above. We show how to coinductively prove the claim for the example sum program in the form given in (1), relying on nothing but a general language-independent coinductive machinery and the trusted execution step relation <sup>→</sup><sup>R</sup> of IMP. Recall that we drop the state frames (σ) in (1).

Intuitively, our approach consists of symbolic execution with the language step relation, plus coinductive reasoning for circular behaviors. Specifically, suppose that <sup>S</sup>*circ* <sup>⊆</sup> <sup>C</sup> × P(C) is a specification corresponding to some code with circular behavior, say some loop. Pairs (c, P) <sup>∈</sup> <sup>S</sup>*circ* with <sup>c</sup> <sup>∈</sup> <sup>P</sup> are already valid, that is, <sup>c</sup> <sup>⇒</sup><sup>R</sup> <sup>P</sup> for those. "Execute" the other pairs (c, P) <sup>∈</sup> <sup>S</sup>*circ* with the step relation <sup>→</sup>R, obtaining a new specification <sup>S</sup> containing pairs of the form (d, P), where <sup>c</sup> <sup>→</sup><sup>R</sup> <sup>d</sup>; since we usually have a mathematical description of the pairs in S*circ* and S , this step has the feel of symbolic execution. Note that S*circ* is valid if S is valid. Do the same for S obtaining a new specification S, and so on and so forth. If at any moment during this (symbolic) execution process we reach a specification S that is included in our original S*circ*, then simply assume that S is valid. While this kind of cyclic reasoning may not seem sound, it is in fact valid, and justified by *coinduction*, which captures the essence of partial correctness, *language-independently*. Reaching something from the original specification shows we have reached some fixpoint, and coinduction is directly related to greatest fixpoints. This is explained in detail in Sect. 3.

In many examples it is useful to chain together individual proofs, similar to (HL-seq). Thus, we introduce the following sequential composition construct:

**Definition 1.** *For* <sup>S</sup>1, S<sup>2</sup> <sup>⊆</sup> <sup>C</sup> × P(C)*, let* <sup>S</sup><sup>1</sup> <sup>o</sup> <sup>9</sup> <sup>S</sup><sup>2</sup> ≡ {(c, P) | ∃Q . (c, Q) <sup>∈</sup> <sup>S</sup><sup>1</sup> ∧ ∀<sup>d</sup> <sup>∈</sup> Q,(d, P) <sup>∈</sup> <sup>S</sup>2}*. Also, we define* trans(S) *as* <sup>S</sup> <sup>o</sup> <sup>9</sup> S *(*trans *can be thought of as a transitivity proof rule).*

If S<sup>1</sup> and S<sup>2</sup> are valid then S<sup>1</sup> <sup>o</sup> <sup>9</sup> S<sup>2</sup> is also valid (Lemma 2).

Given n, let Q<sup>n</sup> and T<sup>n</sup> be the following sets of configurations, where Q<sup>n</sup> and T<sup>n</sup> represent the *invariant set* and *terminal set*, respectively:

$$\begin{aligned} Q\_n &\equiv \{ \langle \mathbf{1oop} \mid \mathbf{n} \mapsto n', \mathbf{s} \mapsto \sum\_{i=n'}^{n-1} i \rangle \mid \forall n' \} \\ T\_n &\equiv \{ \langle \mathbf{s} \mathbf{kip} \mid \mathbf{n} \mapsto 0, \mathbf{s} \mapsto \sum\_{i=1}^{n-1} i \rangle \} \end{aligned}$$

and let us define the following specifications:

$$\begin{aligned} S\_1 &\equiv \{ (\langle \mathtt{s=0}, \ \mathtt{loop} \mid \mathtt{n} \mapsto n \rangle, Q\_n) \mid \forall n \} \\ S\_2 &\equiv \{ (\langle \mathtt{loop} \mid \mathtt{n} \mapsto n', \mathtt{s} \mapsto \sum\_{i=n'}^{n-1} i \rangle, T\_n) \mid \forall n, n' \} \end{aligned}$$

Our target S in (1) is included in S<sup>1</sup> <sup>o</sup> <sup>9</sup> S2, so it suffices to show that S<sup>1</sup> and S<sup>2</sup> are valid. <sup>S</sup><sup>1</sup> clearly is: s=0;loop <sup>|</sup> <sup>n</sup> <sup>→</sup>n →<sup>+</sup> <sup>R</sup> loop <sup>|</sup> <sup>n</sup> <sup>→</sup> n, <sup>s</sup> <sup>→</sup>0 represents the (symbolic) execution step or steps taken to assign program variable s, and the set of specifications {(loop <sup>|</sup> n <sup>→</sup> n, s <sup>→</sup>0, Qn) | ∀n} is vacuously valid (note n−1 <sup>i</sup>=<sup>n</sup> <sup>i</sup> = 0). For the validity of <sup>S</sup>2, we partition it in two subsets, one where <sup>n</sup> = 1 and another with <sup>n</sup> = 1 (case analysis). The former holds same as <sup>S</sup>1, noting that

$$\langle \mathtt{1oop} \mid \mathtt{n} \mapsto 1, \mathtt{s} \mapsto \sum\_{i=1}^{n-1} i \rangle \to\_R^+ \langle \mathtt{skip} \mid \mathtt{n} \mapsto 0, \mathtt{s} \mapsto \sum\_{i=1}^{n-1} i \rangle$$

The latter holds by coinduction (for S2), because first

$$\langle \mathtt{1oop} \mid \mathtt{n} \mapsto n', \mathtt{s} \mapsto \sum\_{i=n'}^{n-1} i \rangle \to\_R^+ \langle \mathtt{1oop} \mid \mathtt{n} \mapsto n'-1, \mathtt{s} \mapsto \sum\_{i=n'-1}^{n-1} i \rangle$$

and second the following inclusion holds:

$$\{ (\langle \mathtt{1oop} \vert \mathtt{n} \mapsto n'-1, \mathtt{s} \mapsto \sum\_{i=n'-1}^{n-1} i \rangle, T\_n) \mid \forall n, n' \} \subseteq S\_2$$

The key part of the proof above was to show that the reachability claim about the loop (S2) was stable under the language semantics. Everything else was symbolic execution using the (trusted) operational semantics of the language. By allowing desirable program properties to be uniformly specified as reachability claims about the (executable) language semantics itself, our approach requires no auxiliary formalization of the language for verification purposes, and thus no soundness or equivalence proofs and no transformations of the original program to make it fit the restrictions of the auxiliary semantics. Unlike for the Hoare logic proof, the main "proof rules" used were just performing execution steps using the operational semantics rules, as well as the generic coinductive principle. Section 3 provides all the technical details.

Structural Operational Semantics x | σ→σ(x) | σ ---x | σ→i | σ[i/x] if i = σ(x) −*Int* 1 e<sup>1</sup> | σ→e - <sup>1</sup> | σ- e<sup>1</sup> op e<sup>2</sup> | σ→e - <sup>1</sup> op e<sup>2</sup> | σ- e<sup>2</sup> | σ→e - <sup>2</sup> | σ- i<sup>1</sup> op e<sup>2</sup> | σ→i<sup>1</sup> op e - <sup>2</sup> | σ- i<sup>1</sup> op i<sup>2</sup> | σ→i<sup>1</sup> *opInt* i<sup>2</sup> | σ s<sup>1</sup> | σ→s - <sup>1</sup> | σ- s<sup>1</sup> s<sup>2</sup> | σ→s - <sup>1</sup> s<sup>2</sup> | σ- skip s | σ→s | σ e | σ→e - | σ- x := e | σ→x := e - | σ- x := i | σ→skip | σ[i/x] e | σ→e - | σ- if e then {s1} else {s2} | σ→if e then {s1} else {s2} | σ- if i then {s1} else {s2} | σ→s<sup>1</sup> | σ if i = 0 if 0 then {s1} else {s2} | σ→s<sup>2</sup> | σ while e {s} | σ→if e then {s while e {s}} else {skip} | σ Reduction Semantics (evaluation contexts syntax omitted— [17]) r → r- E[r] → E[r- ] -E | σ[x] → -E | σ[σ(x)] -E | σ[--x] → -E | σ[i/x][i] if i = σ(x) −*Int* 1 -E | σ[x:= i] → -E | σ[i/x][skip] i<sup>1</sup> op i<sup>2</sup> → i<sup>1</sup> *opInt* i<sup>2</sup> skip s → s if i then {s1} else {s2} → s<sup>1</sup> if i = 0 if 0 then {s1} else {s2} → s<sup>2</sup> while e {s} → if e then {s while e {s}} else {skip} K Semantics (configuration and strictness omitted— [9]) x i ...*<sup>k</sup>* -... x → i ...*state* - -- x i −*Int* 1 ...*<sup>k</sup>* -... x → i i −*Int* 1 ...*state* x := i skip ...*<sup>k</sup>* -... x → i ...*state*

(plus the last five simple rules under reduction semantics)

**Fig. 3.** Three different operational semantics of **IMP**, generating the same execution step relation <sup>R</sup> (or <sup>→</sup>*R*).

### **2.3 Defining Execution Step Relations**

Since our coinductive verification framework is parametric in a step relation, which also becomes the only trust base when certified verification is sought, it is imperative for its practicality to support a variety of approaches to define step relations. Ideally, it should not be confined to any particular semantic style that ultimately defines a step relation, and it should simply take existing semantics "off-the-shelf" and turn them into sound and relatively complete program verifiers for the defined languages. We briefly recall three of the semantic approaches that we experimented with in our Coq formalization [16].

Small-step structural operational semantics [18] (Fig. 3 top) is one of the most popular semantic approaches. It defines the transition relation inductively. This semantic style is easy to use, though often inconvenient to define some features such as abrupt changes of control and true concurrency. Additionally, finding the next successor of a configuration may take longer than in other approaches. Reduction semantics with evaluation contexts [17], depicted in the middle of Fig. 3, is another popular approach. It allows us to elegantly and compactly define complex evaluation strategies and semantics of control intensive constructs (e.g., call/cc), and it avoids a recursive definition of the transition relation. On the other hand, it requires an auxiliary definition of contexts along with splitting and plugging functions.

As discussed in Sect. 1, several large languages have been given formal semantics using K [9] (Fig. 3 bottom). K is more involved and less conventional than the other approaches, so it is a good opportunity to evaluate our hypothesis that we can just "plug-and-play" operational semantics in our coinductive framework. A K-style semantics extends the code in the configuration to a list of terms, and evaluates within subterms by having a transition that extracts the term to the front of the list, where it can be examined directly. This allows a non-recursive definition of transition, whose cases can be applied by unification.

In practice, in our automation, we only need to modify how a successor for a configuration is found. Besides that, the proofs remain exactly the same.

### **3 Coinduction as Partial Correctness**

The intuitive coinductive proof of the correctness of sum in Sect. 2.2 likely raised a lot of questions. We give formal details of that proof in this section as well go through some definitions and results of the underlying theory. All proofs, including our Coq formalization, are in [16].

#### **3.1 Definitions and Main Theorem**

First, we introduce a definition that we used intuitively in the previous section:

**Definition 2.** *If* <sup>R</sup> <sup>⊆</sup> <sup>C</sup> <sup>×</sup> <sup>C</sup>*, let* valid<sup>R</sup> <sup>⊆</sup> <sup>C</sup> × P(C) *be defined as* valid<sup>R</sup> <sup>=</sup> {(c, P) <sup>|</sup> <sup>c</sup> <sup>⇒</sup><sup>R</sup> <sup>P</sup> *holds*}*.*

Recall from Sect. 2.1 that <sup>c</sup> <sup>⇒</sup><sup>R</sup> <sup>P</sup> holds iff the initial state <sup>c</sup> can either reach a state in <sup>P</sup> or can take an infinite number of steps (with <sup>→</sup>R). Pairs (c, P) <sup>∈</sup> <sup>C</sup> × P(C) are called *claims* or *specifications*, and our objective is to prove they hold, i.e., <sup>c</sup> <sup>⇒</sup><sup>R</sup> <sup>P</sup>. Sets of claims <sup>S</sup> <sup>⊆</sup> <sup>C</sup> × P(C) are valid if <sup>S</sup> <sup>⊆</sup> validR. To show such inclusions by coinduction, we notice that valid<sup>R</sup> is a greatest fixpoint, specifically of the following operator:

**Definition 3.** *Given* <sup>R</sup> <sup>⊆</sup> <sup>C</sup> <sup>×</sup> <sup>C</sup>*, let* step<sup>R</sup> : <sup>P</sup>(<sup>C</sup> × P(C)) → P(<sup>C</sup> × P(C)) *be*

$$\text{step}\_R(S) = \{ (c, P) \mid c \in P \; \lor \; \exists d \;. \; c \to\_R d \land (d, P) \in S \} $$

Therefore, to prove (c, P) <sup>∈</sup> stepR(S), one must show either that <sup>c</sup> <sup>∈</sup> <sup>P</sup> or that (succ(c), P) <sup>∈</sup> <sup>S</sup>, where succ(c) is a resulting configuration after taking a step from c by the operational semantics.

**Definition 4.** *Given a monotone function* <sup>F</sup> : <sup>P</sup>(D) → P(D)*, let its* <sup>F</sup>-closure <sup>F</sup><sup>∗</sup> : <sup>P</sup>(D) → P(D) *be defined as* <sup>F</sup>∗(X) = μY. F(<sup>Y</sup> ) <sup>∪</sup> <sup>X</sup>*, where* <sup>μ</sup> *is the least fixpoint operator. This is well-defined as* <sup>Y</sup> <sup>→</sup> <sup>F</sup>(<sup>Y</sup> )∪<sup>X</sup> *is monotone for any* <sup>X</sup>*.*

The following lemma suffices for reachability verification:

**Lemma 1.** *For any* <sup>R</sup>⊆C×<sup>C</sup> *and* <sup>S</sup> <sup>⊆</sup>C×P(C)*, we have* <sup>S</sup> <sup>⊆</sup> stepR(step<sup>∗</sup> <sup>R</sup>(S)) *implies* <sup>S</sup> <sup>⊆</sup> validR*.*

The intuition behind this lemma is captured in Sect. 2.2: we continue taking steps and once we reach a set of states already seen, we know our claim is valid. This would not be valid if stepR(step<sup>∗</sup> R(S)) was replaced simply with step<sup>∗</sup> <sup>R</sup>(S), as <sup>X</sup> <sup>⊆</sup> <sup>F</sup>∗(X) hold trivially for any <sup>F</sup> and <sup>X</sup>. Lemma <sup>1</sup> (along with elementary set properties) replaces the entire program logic shown in Fig. 2. The only formal definition specific to the target language is the operational semantics. Lemma 1 does not need to be modified or re-proven to use it with other languages or semantics. It generalizes into a more powerful result, that can be used to derive a variety of coinductive proof principles:

**Theorem 1.** *If* F, G : <sup>P</sup>(D) → P(D) *are monotone and* <sup>G</sup>(F(A)) <sup>⊆</sup> <sup>F</sup>(G∗(A)) *for any* <sup>A</sup> <sup>⊆</sup> <sup>D</sup>*, then* <sup>X</sup> <sup>⊆</sup> <sup>F</sup>(G∗(X)) *implies* <sup>X</sup> <sup>⊆</sup> νF *for any* <sup>X</sup> <sup>⊆</sup> <sup>D</sup>*, where* νF *is the greatest fixpoint of* F*.*

Proofs, including a verified proof in our Coq formulation are in [16]. The proof can also be derived from [12–14], though techniques from these papers had previously not been applied to program verification. Lemma 1 is an easy corollary, with both F and G instantiated as stepR, along with a proof that ν step<sup>R</sup> = valid<sup>R</sup> (see [16]). However, instantiating F and G to be the same function is not always best. An interesting and useful G is the transitivity function trans in Definition 1, which satisfies the hypothesis in Theorem 1 when F is stepR. [16] shows other sound instantiations of G.

We can also use Theorem 1 with other definitions of validity expressible as a greatest fixpoint, e.g., all-path validity. For nondeterministic languages we might prefer to say <sup>c</sup> <sup>⇒</sup><sup>∀</sup> <sup>P</sup> holds if no path from <sup>c</sup> reaches a stuck configuration without passing through P. This is the greatest fixpoint of

$$\text{step}\_R^\vee(S) = \{ (c, P) \mid c \in P \lor \exists d. \, c \to\_R d \land \forall d. \, (c \to\_R d \text{ implies } (d, P) \in S) \} $$

The universe of validity notions that can be expressed coinductively, and thus the universe of instances of Theorem 1 is virtually limitless. Below is another notion of validity that we experimented with in our Coq formalization [16]. When proving global program invariants or safety properties of non-deterministic programs, we want to state not only reachability claims <sup>c</sup> <sup>⇒</sup> <sup>P</sup>, but also that all the transitions from c to configurations in P respect some additional property, say T. For example, a global state invariant I can be captured by a T such that (a, b) <sup>∈</sup> <sup>T</sup> iff <sup>I</sup>(a) and <sup>I</sup>(b), while an arbitrary safety property can be captured by a T that encodes a monitor for it. This notion of validity, which we call (all-path) "until" validity, is the greatest fixpoint of:

$$\begin{aligned} \text{until} & \overset{\vee}{}(S) = \{ (c, T, P) \mid c \in P \lor \\ & \exists d. \, c \to\_R d \land \forall d. \, (c \to\_R d \text{ implies } (c, d) \in T \land (d, T, P) \in S) \} \end{aligned}$$

This allows verification of properties that are not expressible using Hoare logic.

### **3.2 Example Proof: Sum**

Now we demonstrate the results above by providing all the details that were skipped in our informal proof in Sect. 2.2. The property that we want to prove, expressed as a set of claims (c, P), is

$$S \equiv \{ (\langle \mathtt{s} \mathtt{=0}; \mathtt{while} \,\langle \mathtt{-n} \rangle \{ \mathtt{s} \mathtt{=} \mathtt{s} \mathtt{+n}; \} \, T \mid \mathtt{n} \longmapsto n, \sigma[\bot/\mathtt{s}] \},$$

$$\{ \langle T \mid \mathtt{n} \longmapsto 0, \mathtt{s} \longmapsto \sum\_{i=1}^{n-1} i, \sigma \rangle \}) \mid \forall n, T, \sigma \}$$

We have to prove <sup>S</sup> <sup>⊆</sup> validR. Note that this specification is more general than the specifications in Sect. 2.2. Here, T represents the remainder of the code to be executed, while <sup>σ</sup> represents the remainder of the store, with <sup>σ</sup>[⊥/s] as <sup>σ</sup> restricted to Dom(σ)/{s}. Thus, we write out the entire configuration here, which gives us freedom in expressing more complex specifications if needed.

Instead of proving this directly, we will prove two subclaims valid and connect them via sequential composition (Definition 1). First, we need the following:

**Lemma 2.** S<sup>1</sup> <sup>o</sup> <sup>9</sup> <sup>S</sup><sup>2</sup> <sup>⊆</sup> valid<sup>R</sup> *if* <sup>S</sup><sup>1</sup> <sup>⊆</sup> valid<sup>R</sup> *and* <sup>S</sup><sup>2</sup> <sup>⊆</sup> validR*.*

As before, let

$$\begin{aligned} Q\_n &\equiv \{ \langle \mathbf{1oop}; \ T \mid \mathbf{n} \mapsto n', \mathbf{s} \mapsto \sum\_{i=n'}^{n-1} i, \sigma \rangle \mid \forall n' \} \\ T\_n &\equiv \{ \langle T \mid \mathbf{n} \mapsto 0, \mathbf{s} \mapsto \sum\_{i=1}^{n-1} i \rangle \} \end{aligned}$$

and define

$$\begin{aligned} S\_1 &\equiv \{ (\langle \mathtt{s} \mathtt{=0}; \ 1 \mathtt{oop}; \ \mathtt{ } T \mid \mathtt{n} \longmapsto n, \sigma[\bot/\mathtt{s}] \rangle, Q\_n) \mid \forall n, T, \sigma \} \\ S\_2 &\equiv \{ (\langle \mathtt{1oop}; \ \mathtt{ } T \mid \mathtt{n} \longmapsto n', \mathtt{s} \longmapsto \sum\_{i=n'}^{n-1} i, \sigma \rangle, T\_n) \mid \forall n, n', T, \sigma \} \end{aligned}$$

Since <sup>S</sup> <sup>⊆</sup> <sup>S</sup><sup>1</sup> <sup>o</sup> <sup>9</sup> <sup>S</sup><sup>2</sup> (by <sup>Q</sup>n), it suffices to show <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup><sup>2</sup> <sup>⊆</sup> validR. To prove <sup>S</sup><sup>1</sup> <sup>⊆</sup> validR, by Lemma <sup>1</sup> we show <sup>S</sup><sup>1</sup> <sup>⊆</sup> stepR(step<sup>∗</sup> <sup>R</sup>(S1)). Regardless of the employed executable semantics, this should hold:

$$\forall n, T, \sigma. \left< \mathbf{s=0}; \ 1 \mathbf{oop}; \ \left. T \right| \mathbf{n} \mapsto n, \sigma[\bot/\mathbf{s}] \right> \to\_R \left< \mathbf{1oop}; \ \left. T \right| \mathbf{n} \mapsto n, \mathbf{s} \mapsto 0, \sigma \right>$$

Choosing the second case of the disjunction in step<sup>R</sup> with d matching this step, it suffices to show

$$\{ (\langle \mathtt{1oop}; \ T \mid \mathtt{n} \mapsto n, \mathtt{s} \mapsto 0, \sigma \rangle, Q\_n) \mid \forall n, T, \sigma \} \subseteq \mathsf{step}\_R^\*(S\_1) \}$$

Note that we can unfold any fixpoint F∗(S) to get the following two equations:

$$F(F^\*(S)) \subseteq F(F^\*(S)) \cup S = F^\*(S) \qquad S \subseteq F(F^\*(S)) \cup S = F^\*(S) \tag{3}$$

We use the first equation to expose an application of step<sup>R</sup> on the right hand side, so it suffices to show the above is a subset of stepR(step<sup>∗</sup> <sup>R</sup>(S)). We then use the first case of the disjunction (showing <sup>c</sup> <sup>∈</sup> <sup>P</sup>) in stepR, and instantiating <sup>n</sup> to n proves this goal, since n−1 <sup>i</sup>=<sup>n</sup> <sup>i</sup> = 0. Thus <sup>S</sup><sup>1</sup> <sup>⊆</sup> validR.

Now we prove <sup>S</sup><sup>2</sup> <sup>⊆</sup> validR, or <sup>S</sup><sup>2</sup> <sup>⊆</sup> stepR(step<sup>∗</sup> R(S2)). First, note the operational semantics of IMP rewrites while loops to if statements. Then, by the definition of stepR, it suffices to show that

$$\{ (\langle \mathtt{if} \, \langle \mathtt{-n} \rangle \{ \mathtt{s} = \mathtt{s} + \mathtt{n}; 1 \mathtt{oop} \}; T \mid \mathtt{n} \longmapsto n', \mathtt{s} \longmapsto \sum\_{i=n'}^{n-1} i, \sigma \rangle, T\_n \} \mid \forall n, n', T, \sigma \} \subseteq \mathsf{step}\_R^\*(S\_2) \mid \mathtt{s} \longmapsto \langle \mathtt{if} \, \langle \mathtt{s} = \mathtt{s} + \mathtt{s} + \mathtt{n}; 1 \rangle, T\_n \rangle, \forall n, n', T \} \mid \mathtt{s} \longmapsto \langle \mathtt{if} \, \langle \mathtt{s} = \mathtt{s} + \mathtt{n}; 1 \rangle, T\_n \rangle, \forall n', n'' \} \mid \mathtt{s} \longmapsto \langle \mathtt{if} \, \langle \mathtt{s} = \mathtt{s} + \mathtt{n}; 1 \rangle, T\_n \rangle, \forall n'', n'' \} \mid \mathtt{s} \mapsto \langle \mathtt{if} \, \langle \mathtt{s} = \mathtt{s} + \mathtt{n}; 1 \rangle, T\_n \rangle, \forall n'', n'' \} \mid \mathtt{s} \mapsto \langle \mathtt{s} = \mathtt{s} + \mathtt{n}; 1 \rangle, \forall n'', n'' \} \mid \mathtt{s} \mapsto \mathtt{s} + \mathtt{n} \rangle$$

Using the first unfolding from (3), it suffices to show the above is a subset of stepR(step<sup>∗</sup> <sup>R</sup>(S2)), i.e. we expose an application of step<sup>R</sup> on the right hand side. The definition of step<sup>R</sup> thus allows the left hand side to continue taking execution steps, as long as we keep unfolding the fixpoint. Continuing this, the if condition becomes a single, but symbolic, boolean value. Specifically, it suffices to show:

$$\{ (\{ \mathtt{if} \, (n'-1 \neq 0) \; \middle\} \, \mathtt{s=s+n; 1 \bullet \mathsf{op} \}; T \mid \mathtt{n} \longmapsto n' \mathtt{-1}, \mathtt{s} \mapsto \sum\_{i=n'}^{n-1} i, \sigma \rangle, T\_n \} \mid \forall n, n', T, \sigma \} \subseteq \mathsf{step}\_R^\*(S\_2)$$

Further progress requires making a case distinction on whether <sup>n</sup> <sup>−</sup> 1 = 0. A case distinction corresponds to observing that <sup>A</sup> <sup>∪</sup> <sup>B</sup> <sup>⊆</sup> <sup>X</sup> if both <sup>A</sup> <sup>⊆</sup> <sup>X</sup> and <sup>B</sup> <sup>⊆</sup> <sup>X</sup>. Here we split the current set of claims into those with <sup>n</sup> <sup>−</sup> 1 = 0 and <sup>n</sup> <sup>−</sup> <sup>1</sup> = 0, and separately establish the following inclusions:

$$\{ (\langle \mathtt{if(false)} \rangle \{ \mathtt{s} \simeq \mathtt{s} \mathrel{+} \mathtt{n}; \mathtt{1oop} \}; T \mid \mathtt{n} \longmapsto 0, \mathtt{s} \mapsto \sum\_{i=1}^{n-1} i, \sigma \rangle, T\_{n} \mid \langle \forall n, T, \sigma \rangle \subseteq \mathtt{step}\_{R}^{\star}(S\_{2}) \} $$
 
$$ \{ (\langle \mathtt{if(true} \rangle \{ \mathtt{s} \simeq \mathtt{s} \mathrel{+} \mathtt{n}; \mathtt{1oop} \}; T \mid \mathtt{n} \longmapsto n' \mathtt{1}, \mathtt{s} \mapsto \sum\_{i=n'}^{n-1} i, \sigma \rangle, T\_{n}) \mid \forall n, n' \neq 1, T, \sigma \} \subseteq \mathtt{step}\_{R}^{\star}(S\_{2}) $$

Continuing symbolic execution and using n−1 i=n <sup>i</sup>+ (n <sup>−</sup>1) = n−1 i=n-<sup>−</sup><sup>1</sup> <sup>i</sup>, we get

$$\begin{aligned} \{ (\langle T \mid \mathbf{n} \mapsto 0, \mathbf{s} \mapsto \sum\_{i=1}^{n-1} i, \sigma \rangle, T\_n) \mid \forall n, T, \sigma \} \subseteq \text{step}\_R^\*(S\_2),\\ \{ (\langle \mathbf{1oop}; \, \, T \mid \mathbf{n} \mapsto n'-1, \mathbf{s} \mapsto \sum\_{i=n'-1}^{n-1} i, \sigma \rangle, T\_n) \mid \forall n, n', T, \sigma, n'-1 \neq 0 \} \subseteq \text{step}\_R^\*(S\_2). \end{aligned}$$

In the <sup>n</sup> <sup>−</sup> 1 = 0 case, the current configuration is already in the corresponding target set. To conclude, we expose another application of step<sup>R</sup> as before, but use the clause <sup>c</sup> <sup>∈</sup> <sup>P</sup> of the disjunction in step<sup>R</sup> to leave the trivial goal <sup>∀</sup>n, T, σ.<sup>T</sup> <sup>|</sup> n <sup>→</sup>0, s <sup>→</sup> <sup>n</sup>(n−1) <sup>2</sup> , σ ∈ {<sup>T</sup> <sup>|</sup> <sup>n</sup> <sup>→</sup>0, <sup>s</sup> <sup>→</sup> <sup>n</sup>(n−1) <sup>2</sup> , σ}. For the <sup>n</sup> <sup>−</sup> <sup>1</sup> = 0 case, we have a set of claims that are contained in the initial specification S2. We conclude by showing <sup>S</sup><sup>2</sup> <sup>⊆</sup> step<sup>∗</sup> <sup>R</sup>(S2) from the second equation in (3) by noting that <sup>S</sup> <sup>⊆</sup> <sup>F</sup>∗(S) for any <sup>F</sup>. So this set of claims is contained in <sup>S</sup><sup>2</sup> by instantiating the universally quantified variable <sup>n</sup> in the definition of <sup>S</sup><sup>2</sup> with <sup>n</sup> <sup>−</sup> 1. Thus it is contained in step∗ <sup>R</sup>(S2) and thus it is a subset of validR.

### **3.3 Example Proof: Reverse**

Consider now the following program to reverse a linked list, written in the HIMP language (Fig. 5a). We will discuss HIMP in more detail Sect. 4.

decl p; decl y; p := 0; while (x<>0) { y := (x+1); \*(x+1) := p; p := x; x := y; }

Call the above code rev and the loop rev-loop. We prove this program is correct following intuitions from separation logic [19,20] but using the exact same coinductive technical machinery as before. Assuming we have a predicate that matches a heap containing only a linked list starting at address x and representing the list l (which we will see in Sect. 4.2), our specification becomes:

$$S \equiv \{ (\langle \mathtt{rev}; T \, | \, \mathtt{list}(l, x) \rangle, \{ \langle T \, | \, \lambda r. \mathtt{list}(rev(l), r) \rangle \}) \mid \forall l, x, T \} $$

where rev is the mathematical list reverse. We proceed as in the previous example, first using lemma then stepping with the semantics, but with Q<sup>n</sup> as

$$\{ \langle \mathtt{rev}\mathtt{-1oop}; T \vert \mathtt{list}(A, x) \* \mathtt{list}(B, p) \* \mathtt{x} \mapsto x \* \mathtt{p} \mapsto p \* \mathtt{y} \mapsto y \* \lambda r. \mathtt{list}(B \leftrightarrow A, r) \}$$

where ++ is list append. We continue as before to prove our original specification. S<sup>1</sup> and S<sup>2</sup> follow from our choice for Qn, our "loop invariant." Specifically,

$$S\_2 \equiv \{ (\{\mathtt{rev} \,; \, T \, | \, \mathtt{list}(l, x)\}, \{ (\mathtt{rev} \, \mathtt{-1} \mathtt{op} \,; \, T \, | \, \mathtt{list}(B, p) \ast \mathtt{x} \mapsto x \ast \mathtt{p} \mapsto p \ast \mathtt{y} \mapsto y \} \} $$

$$\begin{aligned} \{ \mathtt{\*} s \, \mathtt{l} \, \mathtt{list}(B \mapsto \mathtt{i}, r) \} & \mid \forall A, B, p, y \} \mid \mathtt{l} \, \mathtt{i}, x, T \} \\\ S\_2 \equiv \{ (\{\mathtt{rev} \, \mathtt{-1} \mathtt{op} \,; \, T \, | \, \mathtt{list}(A, x) \ast \mathtt{l} \, \mathtt{i} \, \mathtt{i} \, B, p \} \ast \mathtt{x} \mapsto x \ast \mathtt{p} \ast \mathtt{y} \mapsto y \ast \lambda r. \mathtt{l} \, \mathtt{l} \, \mathtt{i} \, B \mapsto A, r \} \} \end{aligned} \end{cases}$$

Then, the individual proofs for these specifications closely follow the same flavor as in the previous example: use step<sup>R</sup> to execute the program via the operational semantics, use unions to case split as needed, and finish when we reach something in the target set or that was previously in our specification. The inherent similarity between these two examples hints that automation should not be too difficult. We go into detail regarding such automation in Sect. 4.

Reasoning with fixpoints and functions like step<sup>R</sup> can be thought of as reasoning with proof rules, but ones which interact with the target programming language only through its operational semantics. The step<sup>R</sup> operation corresponds, conceptually, to two such proof rules: taking an execution step and

### **HIMP**

```
append(x, y)
  decl p;
  if (!x) return y;
  p := x;
  while(*(p+1)<>0) p := *(p+1);
  *(p+1) := y;
  return x;
Stack
: append over if over begin
1+ dup @ dup while nip repeat
drop ! else nip then ;
                                 Lambda
                                  (λ (λ IfNil 1 0
                                   ((λ (λ 0 0)
                                       (λ 1 (λ 1 1 0))) (λ
                                    (λ (λ (λ 0 1)) (Deref 0)
                                      (λ IfNil (Cdr 0)
                                        ((λ 5) (Assign 0
                                          (Cons (Car 0) 3)))
                                        (2 (Cdr 0)))))
                                    1)))
```
**Fig. 4.** Destructive list append in three languages.

showing that the current configuration is in the target set. Sequential composition and the trans rule corresponds to a transitivity rule used to chain together separate proofs. Unions correspond to case analysis. The fixpoint in the closure definition corresponds to iterative uses of these proof rules or to referring back to claims in the original specification.

### **4 Experiments**

Now that we have proved the correctness of our coinductive verification approach and have seen some simple examples, we must consider the following pragmatic question: "Can this simple approach really work?". We have implemented it in Coq, and specified and verified programs in a variety of languages, each language being defined as an operational semantics [16]. We show not only that coinductive program verification is feasible and versatile, but also that it is amenable to highly effective proof automation. The simplifications in the manual proof, such as taking many execution steps at once, translate easily into proof tactics.

We first discuss the example languages and programs, and the reusable elements in specifications, especially an effective style of representation predicates for heap-allocated data structures. Then we show how we wrote specifications for example programs. Next we describe our proof automation, which was based on an overall heuristic applied unchanged for each language, though parameterized over subroutines which required somewhat more customization. Finally, we conclude with discussion of our verification of the Schorr-Waite graph-marking example and a discussion of our support for verification of divergent programs.

#### **4.1 Languages**

We discuss three languages following different paradigms, each defined operationally. Many language semantics are available with the distributions of K [9], PLT-Redex [10], and Ott [11], e.g., but we believe these three languages are sufficient to illustrate the language-independence of our approach. Figure 4 shows a destructive linked list append function in each of the three languages.

**HIMP** (IMP with Heap) is an imperative language with (recursive) functions and a heap. The heap addresses are integers, to demonstrate reasoning about lowlevel representations, and memory allocation/deallocation are primitives. The configuration is a 5-tuple of current code, local variable environment mapping identifiers to values, call stack with frames as pairs of code and environment, heap, and a collection of functions as a map from function name to definition.

**Stack** is a Forth-like stack based language, though, unlike in Forth, we do make control structures part of the grammar. A shared data stack is used both for local state and to communicate between function invocations, eliminating the store, formal parameters on function declarations, and the environment of stack frames. Stack's configuration is also a 5-tuple, but instead of a current environment there is a stack of values, and stack frames do not store an environment.

**Lambda** is a call-by-value lambda calculus, extended with primitive integers, pair and nil values, and primitive operations for heap access. Fixpoint combinators enable recursive definitions without relying on primitive support for named functions. We use De Bruijn indices instead of named variables. The semantics is based on a CEK/CESK machine [21,22], extended with a heap. Lambda's configuration is a 4-tuple: current expression, environment, heap, continuation.


**Fig. 5.** Syntax of **HIMP**, **Stack**, and **Lambda**

### **4.2 Specifying Data Structures**

Our coinductive verification approach is agnostic to how claims in <sup>C</sup> × P(C) are specified. In Coq, we can specify sets using any definable predicates. Within this design space, we chose matching logic [23] for our experiments, which introduces patterns that concisely generalize the formulae of first order logic (FOL) and separation logic, as well as term unification. Symbols apply on patterns to build other patterns, just like terms, and patterns can be combined using FOL connectives, just like formulae. E.g., pattern <sup>P</sup> <sup>∧</sup><sup>Q</sup> matches a value if <sup>P</sup> and <sup>Q</sup> both match it, [t] matches only the value <sup>t</sup>, <sup>∃</sup>x.P matches if there is any assignment of x under which P matches, and [[ϕ]] where ϕ is a FOL formula matches any value if ϕ holds, and no values otherwise (in [23] neither [t] nor [[ϕ]] require a visible marker, but in Coq patterns are a distinct type, requiring explicit injections).

To specify programs manipulating heap data structures we use patterns matching subheaps that contain a data structure representing an abstract value. Following [24], we define representation predicates for data structures as functions from abstract values to more primitive patterns. The basic ingredients are primitive map patterns: pattern **emp** for the empty map, <sup>k</sup> <sup>→</sup> <sup>v</sup> for the singleton map binding key <sup>k</sup> to value <sup>v</sup>, and <sup>P</sup> <sup>∗</sup> <sup>Q</sup> for maps which are a disjoint union of submaps matching <sup>P</sup> and, resp., <sup>Q</sup>. We use abbreviation ϕ ≡ [[ϕ]] <sup>∧</sup> **emp** to facilitate inline assertions, and <sup>p</sup> →{v0,...,vi} ≡ <sup>p</sup> <sup>→</sup> <sup>v</sup><sup>0</sup> <sup>∗</sup> ... <sup>∗</sup> (<sup>p</sup> <sup>+</sup> <sup>i</sup>)<sup>→</sup> <sup>v</sup><sup>i</sup> to describe values at contiguous addresses. A heap pattern for a linked list starting at address p and holding list l is defined recursively by

$$\begin{aligned} \text{list}(\text{nil}, p) &= \langle p = 0 \rangle \\ \text{list}(x:l, p) &= \langle p \neq 0 \rangle \ast \exists p\_l \, . p \mapsto \{x, p\_l\} \ast \text{list}(l, p\_l) \end{aligned}$$

We also define list seg(l, e, p) for list segments, useful in algorithms using pointers to the middle of a list, by generalizing the constant 0 (the pointer to the end of the list) to the trailing pointer parameter e. Also, simple binary trees:

$$\begin{aligned} \text{tree}(\text{leaf}, p) &= \langle p = 0 \rangle \\ \text{tree}(\text{node}(x, l, r), p) &= \langle p \neq 0 \rangle \ast \exists p\_l, p\_r.p \mapsto \{x, lp, rp\} \ast \text{tree}(l, lp) \ast \text{tree}(r, rp) \end{aligned}$$

Given such patterns, specifications and proofs can be done in terms of the abstract values represented in memory. Moreover, such primitive patterns are widely reusable across different languages, and so is our proof automation that deals with primitive patterns. Specifically, our proof scripting specific to such pattern definitions is concerned exclusively with unfolding the definition when allowed, deciding what abstract value, if any, is represented at a given address in a partially unfolded heap. This is further used to decide how another claim applies to the current state when attempting a transitivity step.

#### **4.3 Specifying Reachability Claims**

As mentioned, claims in <sup>C</sup> × P(C) can be specified using any logical formalism, here the full power of Coq. An explicit specification can be verbose and low-level,

### **Table 1.** Example list specifications

$$\begin{array}{l} call(\text{Read},[x],[H] \land \text{list}(v:l,x), \lambda r. (v = v) \ast [H])\\ call(\text{Tail},[x],[H] \land \text{list}(v:l,x), \lambda r. [H] \land \text{ }\star \text{list}(l,r))\\ call(\text{Add},[y,x], \text{list}(l,x), \lambda r. \text{list}(y:l,r))\\ call(\text{Add}',[y,x], [H] \land \text{list}(l,x), \lambda r. \text{list}(y:l,r) \text{ } [H])\\ call(\text{Swap},[x], \text{list}(a:b:l,x), \lambda r. \text{list}(b:a:l,x))\\ call(\text{Readloc},[x], [\text{list}(l,x), \lambda r. \text{emp})\\ call(\text{Long},[x], [H] \land \text{list}(l,x), \lambda r. (r = \text{len}(l)) \ast [H])\\ call(\text{Sum},[x], [H] \land \text{list}(l,x), \lambda r. (r = \text{sum}(l))) \ast [H])\\ call(\text{Receve},[x], [H] \land \text{list}(l,x), \lambda r. \text{list}(r \le l))\\ call(\text{Append},[x,y], \text{list}(a,x) \star \text{list}(b,y), \lambda r. \text{list}(a+b,r))\\ call(\text{Coy},[x], [H] \land \text{list}(l,x), \lambda r. \text{list}(l,r) \ast [H])\\ call(\text{Decte},[v,x], [\text{list}(l,x), \lambda r. \text{list}(d,x) \text{ } \star [H])\\ call(\text{Dele},[v,x], \text{list}(l,x), \lambda r. \text{list}(d \text{letter}(v,l), r))\end{array}$$

especially when many semantic components in the configuration stay unchanged. However, any reasonable logic allows making definitions to reduce verbosity and redundancy. Our use of matching logic particularly facilitates framing conditions, allowing us to regain the compactness and elegance of Hoare logic or separation logic specifications with definable syntactic sugar. For example, defining

$$\begin{aligned} \{ \text{call}(\{ (formal) \{ body \},args, P\_{in}, P\_{out} \} ) &= \\ \{ \{ (f(args) \frown rest, env, stk, hep, flus), \{ \{ r \frown rest, env, stk, hep, hep \}, fms \} \\ &\quad \mid \forall r, heap'.\; heap' \models P\_{out}(r) \* [H\_f] \} \\ \mid \forall rest, env, stk, heap, H\_f, fms.\; heap &\models P\_{in} \* [H\_f] \land f \mapsto f(formatals) \{ body \} \in funs \} \end{aligned}$$

gives the equivalent of the usual Hoare pre-/post-condition on function calls, including heap framing (in separation logic style). The notation x y represents the order of evaluation: evaluate x first followed by y. This is often used when y can depend on the value x takes after evaluation.

The first parameter is the function definition. The second is the arguments. The heap effect is described as a pattern P*in* for the allowable initial states of the heap and function P*out* from returned values to corresponding heap patterns. For example, we specify the definition D of append in Fig. 4 by writing *call*(D, [x, y],(list(a, x) <sup>∗</sup> list(b, y)),(λr.list(a++b, r))), which is as compact and elegant as it can be. More specifications are given in Table 1. A number of specifications assert that part of the heap is left entirely unchanged by writing [H] <sup>∧</sup> ... in the precondition to bind a variable <sup>H</sup> to a specific heap, and using the variable in the postcondition (just repeating a representation predicate might permit a function to reallocate internal nodes in a data structure to different addresses). The specifications Add and Add' show that it can be a bit more complicated to assert that an input list is used undisturbed as a suffix of a result list. Specifications such as Length, Append, and Delete are written in terms of corresponding mathematical functions on the lists represented in the heap, separating those functional descriptions from details of memory layout.

When a function contains loops, proving that it meets a specification often requires making some additional claims about configurations which are just about to enter loops, as we saw in Sect. 2.2. We support this with another pattern that takes the current code at an intermediate point in the execution of a function, and a description of the environment:

$$\begin{aligned} \{ stmt(code, env, P\_{in}, P\_{out}) &= \\ \{ ((code, (env, e\_f), stk, heap, funs), \{ \langle \mathtt{return} \, r \frown rest, env', stk, heap', fungs \rangle \\ &\quad \mid \forall r, rest, env', heap'.heap' \models P\_{out}(r) \* [H\_f] \} \\ &\quad \mid \forall e\_f, stk, heap, H\_f, f\_r, f\_s \&\ \mid \mathit{heap} \models P\_{in} \* [H\_f] \} \end{aligned}$$

Verifying the definition of append in Fig. 4 meets the call specification above requires an auxiliary claim about the loop, which can be written using *stmt* as

$$\begin{aligned} &\text{s.t.t(\mathtt{while}\ \{\mathtt{\*}\{\mathtt{p+1}\{\mathtt{c}\}\mathtt{0}\}\ldots, \mathtt{(x\longmapsto x,\mathtt{y}\mapsto y,\mathtt{p}\mapsto p)},\\ &\text{(list.\text{seg}(l\_x,p,x)\*\text{list}(l\_p,p)\*\text{list}(l\_y,y)), (\lambda r.\text{list}(l\_x\mapsto l\_p\mapsto l\_y,r))). \end{aligned}$$

The patterns above were described using HIMP's configurations; we defined similar ones for Stack and Lambda also.

### **4.4 Proofs and Automation**

The basic heuristic in our proofs, which is also the basis of our proof automation, is to attack a goal by preferring to prove that the current configuration is in the target set if possible, then trying to use claims in the specification by transitivity, and only last resorting to taking execution steps according to the operational semantics or making case distinctions. Each of these operations begins, as in the example proofs, with certain manipulations of the definitions and fixpoints in the language-independent core. Our heuristic is reusable, as a proof tactic parameterized over sub-tactics for the more specific operations. A prelude to the main loop begins by applying the main theorem to move from claiming validity to showing a coinduction-style inclusion, and breaking down a specification with several classes of claims into a separate proof goal for each family of claims.

Additionally, our automation leverages support offered by the proof assistant, such as handling conjuncts by trying to prove each case, existentials by introducing a unification variable, equalities by unification, and so on. Moreover, we added tactics for map equalities and numerical formulae, which are shared among all languages involving maps and integers. The current proof goal after each step is always a reachability claim. So even in proofs which are not completely automatic, the proof automation can give up by leaving subgoals for the user, who can reinvoke the proof automation after making some proof steps of their own as long as they leave a proof goal in the same form.

Proving the properties in Table 1 sometimes required making additional claims about while loops or auxiliary recursive functions. All but the last four were proved automatically by invoking (an instance of) our heuristic proof tactic:

Proof. list\_solver. Qed.

Append and copy needed to make use of associativity of list append. Reverse used a loop reversing the input list element by element onto an output list, which required relating the tail recursive *rev app*(x : l, y) = *rev app*(l, x : y) with the Coq standard library definition *rev*(x : l) = *rev*(l)++[x]. Manually applying these lemmas merely modified the proof scripts to

list\_solver. rewrite app\_ass in \* |- . list\_run. list\_solver. rewrite <- rev\_alt in \* |- . list\_run.

These proofs were used verbatim in each of our example languages. The only exceptions were append and copy for Lambda, for which the app ass lemma was not necessary. For Delete, simple reasoning about *delete*(*v*, *l*) when v is and is not at the head of the list is required, though the actual reasoning in Coq varies between our example languages. No additional lemmas or tactics equivalent to Hoare rules are needed in any of these proofs.

### **4.5 Other Data Structures**

Matching logic allows us to concisely define many other important data structures. Besides lists, we also have proofs in Coq with trees, graphs, and stacks [16]. These data structures are all used for proving properties about the Schorr-Waite algorithm. In the next section we go into more detail about these data structures and how they are used in proving the Schorr-Waite algorithm.

#### **4.6 Schorr-Waite**

Our experiments so far demonstrate that our coinductive verification approach applies across languages in different paradigms, and can handle usual heap programs with a high degree of automation. Here we show that we can also handle the famous Schorr-Waite graph marking algorithm [25], which is a well-known verification challenge, "The Schorr-Waite algorithm is the first mountain that any formalism for pointer aliasing should climb" [26]. To give the reader a feel for what it takes to mechanically verify such an algorithm, previous proofs in [27] and [28] required manually produced proof scripts of about 470 and, respectively, over 1400 lines and they both used conventional Hoare logic. In comparison our proof is 514 lines. Line counts are a crude measure, but we can at least conclude that the language independence and generality of our approach did not impose any great cost compared to using language-specific program logics.

The version of Schorr-Waite that we verified is based on [29]. First, however, we verify a simpler property of the algorithm, showing that the given code correctly marks a tree, in the absence of sharing or cycles. Then we prove the same code works on general graphs by considering the tree resulting from a depth first traversal. We define graphs by extending the definition of trees to allow a child of a node in an abstract tree to be a reference back to some existing node, in addition to an explicit subtree or a null pointer for a leaf. To specify that graph nodes are at their original addresses after marking, we include an address along with the mark flag in the abstract data structure in the pattern

$$\begin{aligned} \text{grph}(\text{leaf}, m, p') &= \langle p'=0 \rangle \\ \text{grph}(\text{backref}(p), m, p') &= \langle p'=p \rangle \\ \text{grph}(\text{node}(p, l, r), m, p') &= \langle p'=p \rangle \* \exists p\_l, p\_r \ . \\ p &\mapsto \{m, p\_l, p\_r\} \* \text{grph}(l, m, p\_l) \* \text{grph}(r, m, p\_r) \end{aligned}$$

The overall specification is *call*(*Mark*, [p], grph(G, 0, p), λr.grph(G, 3, p)).

To describe the intermediate states in the algorithm, including the clever pointer-reversal trick used to encode a stack, we define another data structure for the context, in zipper style. A position into a tree is described by its immediate context, which is either the topmost context, or the point immediately left or right of a sibling tree, in a parent context. These are represented by nodes with intermediate values of the mark field, with one field pointing to the sibling subtree and the other pointing to the representation of the rest of the context.

$$\begin{aligned} \text{stack}(\text{Top}, p) &= \langle p = 0 \rangle \\ \text{stack}(\text{LeftOf}(r, k), p) &= \exists p\_r, p\_k \cdot p \mapsto \{1, p\_r, p\_k\} \ast \text{grph}(r, 0, p\_r) \ast \text{stack}(k, p\_k) \\ \text{stack}(\text{RightOf}(l, k), p) &= \exists p\_l, p\_k \cdot p \mapsto \{2, p\_k, p\_l\} \ast \text{stack}(k, p\_k) \ast \text{grph}(l, 3, p\_l) \end{aligned}$$

This is the second data structure needed to specify the main loop. When it is entered, there are only two live local variables, one pointing to the next address to visit and the other keeping context. The next node can either be the root of an unmarked subtree, with the context as stack, or the first node in the implicit stack when ascending after marking a tree, with the context pointing to the node that was just finished. For simplicity, we write a separate claim for each case.

 $stmt( Loop, (\mathtt{p} \mapsto p, \mathtt{q} \mapsto q), (\mathtt{grph}(G, 0, p) \ast \mathtt{stack}(S, q)), \lambda r.\mathtt{g} \mathsf{rph}(plug(G, S), 3))$  $stmt( Loop, (\mathtt{p} \mapsto p, \mathtt{q} \mapsto q), (\mathtt{stack}(S, p) \ast \mathtt{g} \mathsf{rph}(G, 3, q)), \lambda r.\mathtt{g} \mathsf{rph}(plug(G, S), 3))$ 

The application of all the semantic steps was handled entirely automatically, the manual proof effort being entirely concerned with reasoning about the predicates above, for which no proof automation was developed.

### **4.7 Divergence**

Our coinductive framework can also be used to verify a program is divergent. Such verification is often a topic that is given its own treatment, as in [30,31], though in our framework, no additional care is needed. To prove a program is divergent on all inputs, one verifies a set of claims of the form (c, <sup>∅</sup>), so that no configuration can be determined valid by membership in the final set of states. We have verified the divergence of a simple program under each style of IMP semantics in Fig. 3, as well as programs in each language from Sect. 4.1. These program include the omega combinator and the sum program from Sect. 3.2 with true replacing the loop guard.

### **4.8 Summary of Experiments**

Statistics are shown in Table 2. For each example, size shows the amount of code to be verified, the size of the specification, and the size of the proof script. If verifying an example required auxiliary definitions or lemmas specific to that example, the size of those definitions were counted with the specification or proof. Many examples were verified by a single invocation of our automatic proof tactic, giving 1-line proofs. Other small proofs required human assistance only in the form of applying lemmas about the domain. Proofs are generally smaller than the specifications, which are usually about as large as the code. This is similar to the results for Bedrock [32], and good for a foundational verification system.


**Table 2.** Proof statistics

The reported "Proof" time is the time for Coq to process the proof script, which includes running proof tactics and proof searches to construct a complete proof. If this run succeeds, it produces a proof certificate file which can be rechecked without that overhead. For an initial comparison with Bedrock we timed their SinglyLinkedList.v example, which verifies length, reverse, and append functions that closely resemble our example code. The total time to run the Bedrock proof script was 93 s, and 31 s to recheck the proof certificate, distinctly slower than our times in Table 2. To more precisely match the Bedrock examples we modified our programs to represent lists nodes with fields at successive addresses rather than using HIMP's records, but this only improved performance, down to 20 s to run the proof scripts, and 4 s to check the certificates.

### **5 Subsuming Reachability Logic**

Reachability logic [33] is a closely related approach to program verification using operational semantics. In fact, our coinductive approach came about when trying to distill reachability logic into its mathematical essence. The practicality of reachability logic has recently been demonstrated, as the reachability logic proof system has been shown to work with several independently developed semantics of real-world languages, such as C, Java, and JavaScript [15].

#### **5.1 Advantages of Coinduction**

A mechanical proof of our soundness theorem gives a more usable verification framework, since reachability logic requires operational semantics to be given as a set of rewrite rules, while our approach does not. Further, reachability logic fixes a set of syntactic proof rules, while in our approach the mathematical fixpoints and functions act as proof rules without explicitly requiring any. In fact, the generality of our approach allows introductions of other derived rules that do not compromise the soundness result. Similarly, the generality allows higher-order verification, which reachability logic cannot handle.

Further, we saw in Sect. 3 that the general proof of our theorem is entirely mathematical. We instantiate it with the step<sup>R</sup> function to get a program verification framework. However, if we instantiate it with other functions, we could get frameworks for proving different properties, such as all-path validity or the "until" notion of validity previously mentioned. Reachability logic does not support any other notion of validity without changes to its proof system, which then require new proofs of soundness and relative **Axiom** : ϕ ⇒ ϕ ∈ A A <sup>C</sup> ϕ ⇒ ϕ **Reflexivity** : A ϕ ⇒ ϕ **Transitivity** : A <sup>C</sup> <sup>ϕ</sup><sup>1</sup> <sup>⇒</sup><sup>+</sup> <sup>ϕ</sup><sup>2</sup> A∪C <sup>ϕ</sup><sup>2</sup> <sup>⇒</sup> <sup>ϕ</sup><sup>3</sup> A <sup>C</sup> ϕ<sup>1</sup> ⇒ ϕ<sup>3</sup> **Logic Framing** : A <sup>C</sup> ϕ ⇒ ϕ ψ is a FOL formula A <sup>C</sup> ϕ ∧ ψ ⇒ ϕ ∧ ψ **Consequence** : |= ϕ<sup>1</sup> → ϕ <sup>1</sup> A <sup>C</sup> ϕ <sup>1</sup> ⇒ ϕ <sup>2</sup> |= ϕ <sup>2</sup> → ϕ<sup>2</sup> A <sup>C</sup> ϕ<sup>1</sup> ⇒ ϕ<sup>2</sup> **Case Analysis** : A <sup>C</sup> ϕ<sup>1</sup> ⇒ ϕ A <sup>C</sup> ϕ<sup>2</sup> ⇒ ϕ A <sup>C</sup> ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> ⇒ ϕ **Abstraction** : A <sup>C</sup> ϕ ⇒ ϕ X ∩ *FreeVars*(ϕ ) = ∅ A <sup>C</sup> ∃X ϕ ⇒ ϕ **Circularity** : A C∪{ϕ⇒ϕ-} ϕ ⇒ ϕ A <sup>C</sup> ϕ ⇒ ϕ

**Fig. 6.** Reachability Logic proof system. Sequent A ϕ ⇒ ϕ is a shorthand for A <sup>∅</sup> ϕ ⇒ ϕ- .

completeness. For our framework, the proof of the main theorem does not need to be modified at all, and one only needs to prove that all-path validity is a greatest fixpoint (see Sect. 3). The same is true for any property. In this sense, this coinduction framework is much more general than the reachability logic proof system presented in [34].

### **5.2 Reachability Logic Proof System**

The key construct in reachability logic is the notion of circularity. Circularities, represented as C in Fig. 6, intuitively represent claims that are conjectured to be true but have not yet been proved true. These claims are proved using the Circularity rule, which is analogous in our coinductive framework to referring back to claims previously seen. Most of the other rules in Fig. 6 are not as interesting. Transitivity requires progress before the circularities are flushed as axioms. This corresponds to the outer step<sup>R</sup> in our coinductive framework.

Clearly, there are obvious parallels between the Reachability Logic proof system and our coinductive framework. We have formalized and mechanically verified a detailed proof that reachability logic is an instance of our coinductive verification framework. One can refer to [16] for full details, but we briefly discuss the nature of the proof below.

#### **5.3 Reachability Logic is Coinduction**

To formalize what it means for reachability logic to be an instance of coinduction, we first need some definitions. First, we need a translation from a reachability rule to a set of coinductive claims. In a reachability rule <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> , both ϕ and ϕ are patterns which respectively describe (symbolically) the starting and the reached configurations. Both ϕ and ϕ can have free variables. Let *Var* be the set of variables. Then, we define the set of claims

$$S\_{\varphi \Rightarrow \varphi'} \equiv \{ (c, \overline{\rho}(\varphi')) \mid c \in \overline{\rho}(\varphi), \,\forall \rho: Var \to Cfg \}.$$

where *Cfg* is the model of configurations and <sup>ρ</sup>(·) is the extension of the valuation ρ to patterns [15]. Also, let the claims derived from a set of reachability rules <sup>X</sup> <sup>=</sup> {ϕ<sup>1</sup> <sup>⇒</sup> <sup>ϕ</sup> <sup>1</sup>,...,ϕ<sup>n</sup> <sup>⇒</sup> <sup>ϕ</sup> <sup>n</sup>} be:

$$\overline{X} \equiv \bigcup\_{\varphi\_i \Rightarrow \varphi\_i' \in X} S\_{\varphi\_i \Rightarrow \varphi\_i'}$$

In reachability logic, programming language semantics are defined as *theories*, that is, as sets of (one-step) reachability rules A with patterns over a given signature of symbols. Each theory A defines a transition relation over the configurations in *Cfg*, say <sup>R</sup>A, which is then used to define the semantic validity in reachability logic, A |<sup>=</sup> <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> . It is possible and easier to prove our main theorem more generally, for any transition relation R that satisfies R -<sup>+</sup> <sup>A</sup>:

$$R \models^+ \mathcal{A} \text{ if } R \models^+ \varphi \Rightarrow \varphi' \text{ for each } \varphi \Rightarrow \varphi' \in \mathcal{A}$$

where R -<sup>+</sup> <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> if for each <sup>ρ</sup> : *Var* <sup>→</sup> *Cfg* and <sup>γ</sup> : *Cfg* such that (ρ, γ) ϕ [33], there is a <sup>γ</sup> such that <sup>γ</sup> <sup>→</sup><sup>R</sup> <sup>γ</sup> and (γ , ρ(ϕ )) is a valid reachability claim.

**Lemma 3.** <sup>R</sup><sup>A</sup> -<sup>+</sup> <sup>A</sup> *and if* <sup>S</sup>ϕ⇒ϕ- ⊆ validR<sup>A</sup> *then* A <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> *.*

This lemma suggests what to do: take any reachability logic proof of A <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> and any transition relation <sup>R</sup> such that <sup>R</sup> -<sup>+</sup> <sup>A</sup>, and produce a coinductive proof of <sup>S</sup>ϕ⇒ϕ- ⊆ validR. This gives us not only a procedure to associate coinductive proofs to reachability logic proofs, but also an alternative method to prove the soundness of reachability logic. This is what we do below:

**Theorem 2.** *If there is a reachability logic proof derivation for* A <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> *and a transition relation* R *such that* R -<sup>+</sup> <sup>A</sup>*, then* <sup>S</sup><sup>ϕ</sup>⇒ϕ- ⊆ validR*, and in particular this holds by applying Theorem 1 to an inclusion* C ⊆ stepR(derived<sup>∗</sup> <sup>R</sup>(C))*. Here,* derived<sup>R</sup> *is a particular function satisfying the conditions for* G *in Theorem 1 (see [16] for more details), and* <sup>C</sup> *is a set of reachability rules consisting of* <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> *along with those reachability rules which appear as conclusions of instances of the Circularity proof rule in the proof tree of* A <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> *.*

To prove Theorem 2, we apply the Set Circularity theorem of reachability logic [35], which states that any reachability logic claim A <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> is provable iff there is some set of claims <sup>C</sup> such that <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> ∈ C and for each <sup>ϕ</sup><sup>i</sup> <sup>⇒</sup> <sup>ϕ</sup> <sup>i</sup> ∈ C there is a proof of A <sup>C</sup> <sup>ϕ</sup><sup>i</sup> <sup>⇒</sup> <sup>ϕ</sup> <sup>i</sup> which does not use the Circularity proof rule. In the forward direction, we can take C as defined in the statement of Theorem 2. The main idea is to convert proof trees into inclusions of sets of claims:

**Lemma 4.** *Given a proof derivation of* A <sup>C</sup> <sup>ϕ</sup><sup>a</sup> <sup>⇒</sup> <sup>ϕ</sup><sup>b</sup> *which does not use the Circularity proof rule (last rule in Fig. 6), if* R -<sup>+</sup> <sup>A</sup> *and* <sup>C</sup> *is nonempty then* <sup>S</sup><sup>ϕ</sup>*a*⇒ϕ*<sup>b</sup>* <sup>⊆</sup> stepR(derived<sup>∗</sup> <sup>R</sup>(C))*.*

This lemma is proven by strengthening the inclusion into one that can be proven by structural induction over the Reachability Logic proof rules besides Circularity.

Combining this lemma with Set Circularity shows that <sup>C</sup> <sup>=</sup> <sup>∪</sup>iS<sup>ϕ</sup>*i*⇒ϕ- *<sup>i</sup>* ⊆ valid<sup>R</sup> which implies that <sup>S</sup><sup>ϕ</sup>⇒ϕ- ⊆ valid<sup>R</sup> exactly as desired. We have mechanized the proofs of Lemmas 3 and 4 in Coq [16]. This is a major result, constituting an independent soundness proof for Reachability Logic, and helps demonstrate the strength of our coinductive framework, despite its simplicity. Moreover, this allows proofs done using reachability logic as in [15] to be translated to mechanically verified proofs in Coq, immediately allowing foundational verification of programs written in *any language*.

### **6 Other Related Work**

Here we discuss work other than reachability logic that is related to our coinductive verification system. We discuss commonly used program verifiers, including approaches based on operational semantics and Iris [36], an approach with some language independence. We also discuss related coinduction schemata.

#### **6.1 Current Verification Tools**

A number of prominent tools such as Why [37], Boogie [38,39], and Bedrock [24,32] provide program verification for a fixed language, and support other languages by translation if at all. For example, Frama-C and Krakatoa, respectively, attempt to verify C and Java by translation through Why. Also, Spec# and Havoc, respectively, verify C# and C by translation through Boogie. We are not aware of soundness proofs for these translations. Such proofs would be highly non-trivial, requiring formal semantics of both source and target languages.

All of these systems are based on a verification condition (VC) generator for their programming language. Bedrock is closest in architecture and guarantees to our system, as it is implemented in Coq and verification results in a Coq proof certificate that the specification is sound with respect to a semantics of the object language. Bedrock supports dynamically created code, and modular verification of higher-order functions, for which our framework has preliminary support. Bedrock also makes more aggressive attempts at complete automation, which costs increased runtime. Most fundamentally, Bedrock is built around a VC generator for a fixed target language.

In sharp contrast to the above approaches, we demonstrated that a smallstep operational semantics suffices for program verification, without a need to define any other semantics, or verification condition generators, for the same language. A language-independent, sound and (relatively) complete coinductive proof method then allows us to verify properties of programs using directly the operational semantics. As seen in Sect. 4.8 this language independence does not compromise other desirable properties. The required human effort and the performance of the verification task compare well with foundational program verifiers such as Bedrock, and we provide the same high confidence in correctness: the trust base consists of the operational semantics only.

#### **6.2 Operational Semantics Based Approaches**

Verifiable C [40] is a program verification tool for the C programming language based on an operational semantics for C defined in Coq. Hoare triples are then proved as lemmas about the operational semantics. However, in this approach and other similar approaches, it is necessary to prove such lemmas. Without them, verification of any nontrivial C program would be nearly impossible. In our approach, while we can also define and prove Hoare triples as lemmas, doing so is not needed to make program verification feasible, as demonstrated in the previous sections. We only need some additional domain reasoning in Coq, which logics like Verifiable C require *in addition* to Hoare logic reasoning. Thus, our approach automatically yields a program verification tool for any language with minimal additional reasoning, while approaches such as Verifiable C need over 40,000 lines of Coq to define the program logic. We believe this is completely unnecessary, and hope our coinductive framework will be the first step in eliminating such superfluous logics.

The work by the FLINT group [41–43] is another approach to program verification based on operational semantics. Languages developed use shallowly embedded state predicates in Coq, and inference rules are derived directly from the operational semantics. However, their work is not generic over operational semantics. For example, [43] is developed in the context of a particular machine model, with a fixed memory representation and register file. Even simple changes such as adding registers require updating soundness proofs. Our approach has a single soundness theorem that can be instantiated for *any* language.

Iris [36] is a concurrent separation logic that has language independence, with operational semantics formalized in Coq. Iris adds monoids and invariants to the program logic in order to facilitate verification. It also derives some Hoarestyle rules for verification from the semantics of a language. However, there are still structural Hoare rules that depend on the language that must be added manually. Additionally, once proof rules are generated, they are specialized to that particular language. Further, the verification in the paper relies on Hoare style reasoning, while in our approach, we do not assume any such verification style, as we work directly with the mathematical specifications. Finally, the monoids used are not generated and are specific to the program language used.

### **6.3 Other Coinduction Schemata**

A categorical generalization of our key theorem was presented as a recursion scheme in [12,13]. The titular result of the former is the dual of the λ-coiteration scheme of the latter, which specializes to preorder categories to give our Theorem 1. A more recent and more general result is [14], which also generalized other recent work on coinductive proofs such as [44]. Unlike these approaches, which were presented for showing bisimilarity, the novelty of our approach stems in the use of these techniques directly to show Hoare-style functional correctness claims, and in the development of the afferent machinery and automation that makes it work with a variety of languages, and not in advancing the already solid mathematical foundations of coinduction. Various weaker coinduction schemes are folklore, such as Isabelle/HOL's standard library's lemma coinduct3: *mono*(f) <sup>∧</sup> <sup>A</sup>⊆f(μx. f(x) <sup>∪</sup> <sup>A</sup> <sup>∪</sup> νf) =<sup>⇒</sup> <sup>A</sup>⊆ν(f).

### **7 Conclusion and Future Work**

We presented a language-independent program verification framework. Proofs can be as simple as with a custom Hoare logic, but only an operational semantics of the target language is required. We have mechanized a proof of the correctness of our approach in Coq. Combining this with a coinductive proof thus produces a Coq proof certificate concluding that the program meets the specification according to the provided semantics. Our approach is amenable to proof automation. Further automation may improve convenience and cannot compromise soundness of the proof system. A language designer need only give an authoritative semantics to enable program verification for a new language, rather than needing to have the experience and invest the effort to design and prove the soundness of a custom program logic.

One opportunity for future work is using our approach to provide proof certificates for reachability logic program verifiers such as K [9]. The K prover was used to verify programs in several real programming languages [15]. While the proof system is sound, trusting the results of these tools requires trusting the implementation of the K system. Our translation in Sect. 5 will allow us to produce proof objects in Coq for proofs done in K's backend, which will make it sufficient to trust only Coq's proof checker to rely on the results from K's prover.

Another area for future work is verifying programs with higher-order specifications, where a specification can make reachability claims about values quantified over in the specification. This allows higher-order functions to have specifications that require functional arguments to themselves satisfy some specification. We have begun preliminary work on proving validity of such specifications using the notions of compatibility up-to presented in [14]. Combining this with more general forms of claims may allow modular verification of concurrent programs, as in RGsep [45]. See [16] for initial work in these areas.

Other areas for future work are evaluating the reusability of proof automation between languages, and using the ability to easily verify programs under a modified semantics, e.g. adding time costs to allow proving real-time properties.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq**

Vincent Rahli(B), Ivana Vukotic, Marcus V¨olp, and Paulo Esteves-Verissimo

SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg *{*vincent.rahli,ivana.vukotic,marcus.voelp,paulo.verissimo*}*@uni.lu

**Abstract.** Our increasing dependence on complex and critical information infrastructures and the emerging threat of sophisticated attacks, ask for extended efforts to ensure the correctness and security of these systems. Byzantine fault-tolerant state-machine replication (BFT-SMR) provides a way to harden such systems. It ensures that they maintain correctness and availability in an application-agnostic way, provided that the replication protocol is correct and at least <sup>n</sup> *<sup>−</sup>* <sup>f</sup> out of <sup>n</sup> replicas survive arbitrary faults. This paper presents Velisarios, a logic-of-events based framework implemented in Coq, which we developed to implement and reason about BFT-SMR protocols. As a case study, we present the first machine-checked proof of a crucial safety property of an implementation of the area's reference protocol: PBFT.

**Keywords:** Byzantine faults · State machine replication Formal verification · Coq

### **1 Introduction**

Critical information infrastructures such as the power grid or water supply systems assume an unprecedented role in our society. On one hand, our lives depend on the correctness of these systems. On the other hand, their complexity has grown beyond manageability. One state of the art technique to harden such critical systems is Byzantine fault-tolerant state-machine replication (BFT-SMR). It is a generic technique that is used to turn any service into one that can tolerate *arbitrary* faults, by extensively replicating the service to mask the behavior of a minority of possibly faulty replicas behind a majority of healthy replicas, operating in consensus.<sup>1</sup> The total number of replicas n is a parameter over the maximum number of faulty replicas f, which the system is configured to tolerate

This work is partially supported by the Fonds National de la Recherche Luxembourg (FNR) through PEARL grant FNR/P14/8149128.

<sup>1</sup> For such techniques to be useful and in order to avoid persistent and shared vulnerabilities, replicas need to be rejuvenated periodically [17,76], they need to be diverse enough [43], and ideally they need to be physically far apart. Diversity and rejuvenation are not covered here.

c The Author(s) 2018

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 619–650, 2018.

https://doi.org/10.1007/978-3-319-89884-1\_22

at any point in time. Typically, n = 3f + 1 for classical protocols such as in [16], and n = 2f + 1 for protocols that rely on tamper-proof components such as in [82]. Because such protocols tolerate arbitrary faults, a faulty replica is one that does not behave according to its specification. For example it can be one that is controlled by an attacker, or simply one that contains a bug.

Ideally, we should guarantee the correctness and security of such replicated and distributed, hardened systems to the highest standards known to mankind today. That is, the proof of their correctness should be checked by a machine and their model refined down to machine code. Unfortunately, as pointed out in [29], most distributed algorithms, including BFT protocols, are published in pseudocode or, in the best case, a formal but not executable specification, leaving their safety and liveness questionable. Moreover, Lamport, Shostak, and Pease wrote about such programs: "We know of no area in computer science or mathematics in which informal reasoning is more likely to lead to errors than in the study of this type of algorithm." [54]. Therefore, we focus here on developing a generic and extensible formal verification framework for systematically supporting the mechanical verification of BFT protocols and their implementations.<sup>2</sup>

Our framework provides, among other things, a model that captures the idea of arbitrary/Byzantine faults; a collection of standard assumptions to reason about systems with faulty components; proof tactics that capture common reasoning patterns; as well as a general library of distributed knowledge. All these parts can be reused to reason about any BFT protocol. For example, most BFT protocols share the same high-level structure (they essentially disseminate knowledge and vote on the knowledge they gathered), which we capture in our knowledge theory. We have successfully used this framework to prove a crucial safety property of an implementation of a complex BFT-SMR protocol called PBFT [14–16]. We handle all the functionalities of the base protocol, including garbage collection and view change, which are essential in practical protocols. Garbage collection is used to bound message logs and buffers. The view change procedure enables BFT protocols to make progress in case the *primary*—a distinguished replica used in some fault-tolerant protocols to coordinate votes becomes faulty.

**Contributions.** Our contributions are as follows: (1) Section 3 presents Velisarios, our continuing effort towards a generic and extensible logic-of-events based framework for verifying implementations of BFT-SMR protocols using Coq [25]. (2) As discussed in Sect. 4, our framework relies on a library to reason about *distributed epistemic knowledge*. (3) We implemented Castro's landmark PBFT protocol, and proved its agreement safety property (see Sect. 5). (4) We implemented a runtime environment to run the OCaml code we extract from Coq (see Sect. 6). (5) We released Velisarios and our PBFT safety proof under an open source licence.<sup>3</sup>

<sup>2</sup> Ideally, both (1) the replication mechanism and (2) the instances of the replicated service should be verified. However, we focus here on (1), which has to be done only once, while (2) needs to be done for every service and for every replica instance.

<sup>3</sup> Available at: https://github.com/vrahli/Velisarios.

**Why PBFT?** We have chosen PBFT because several BFT-SMR protocols designed since then either use (part of) PBFT as one of their main building blocks, or are inspired by it, such as [6,8,26,45,46,82], to cite only a few. Therefore, a bug in PBFT could imply bugs in those protocols too. Castro provided a thorough study of PBFT: he described the protocol in [16], studied how to proactively rejuvenate replicas in [14], and provided a pen-and-paper proof of PBFT's safety in [15,17]. Even though we use a different model—Castro used I/O automata (see Sect. 7.1), while we use a logic-of-events model (see Sect. 3)—our mechanical proof builts on top of his pen-and-paper proof. One major difference is that here we verify actual running code, which we obtain thanks to Coq's extraction mechanism.

### **2 PBFT Recap**

This section provides a rundown of PBFT [14–16], which we use as running example to illustrate our model of BFT-SMR protocols presented in Sect. 3.

### **2.1 Overview of the Protocol**

We describe here the public-key based version of PBFT, for which Castro provides a formal pen-and-paper proof of its safety. PBFT is considered the first practical BFT-SMR protocol. Compared to its predecessors, it is more efficient and it does not rely on unrealistic assumptions. It works with asynchronous, unreliable networks (i.e., messages can be dropped, altered, delayed, duplicated, or delivered out of order), and it tolerates independent network failures. To achieve this, PBFT assumes strong cryptography in the form of collision-resistant digests, and an existentially unforgeable signature scheme. It supports any deterministic state machine. Each state machine replica maintains the service state and implements the service operations. Clients send requests to all replicas and await f + 1 matching replies from different replicas. PBFT ensures that healthy replicas execute the same operations in the same order.

To tolerate up to f faults, PBFT requires |R| = 3f+1 replicas. Replicas move trough a succession of configurations called *views*. In each view v, one replica (p = v *mod* |R|) assumes the role of *primary* and the others become *backups*. The primary coordinates the votes, i.e., it picks the order in which client requests are executed. When a backup suspects the primary to be faulty, it requests a view-change to select another replica as new primary.

**Normal-Case.** During normal-case operation, i.e., when the primary is not suspected to be faulty by a majority of replicas, clients send requests to be executed, which trigger agreement among the replicas. Various kinds of messages have to be sent among clients and replicas before a client knows its request has been executed. Figure 1 shows the resulting message patterns for PBFT's normal-case operation and view-change protocol. Let us discuss here normal-case operation:

**Fig. 1.** PBFT normal-case (left) and view-change (right) operations


Client and replica authenticity, and message integrity are ensured through signatures of the form m<sup>σ</sup>*<sup>i</sup>* . A replica accepts a message m only if: (1) m's signature is correct, (2) m's view number matches the current view, and (3) the sequence number of m is in the water mark interval (see below).

PBFT buffers pending client requests, processing them later in batches. Moreover, it makes use of checkpoints and water marks (which delimit sequence number intervals) to limit the size of all message logs and to prevent replicas from exhausting the sequence number space.

**Garbage Collection.** Replicas store all correct messages that were created or received in a log. Checkpoints are used to limit the number of logged messages by removing the ones that the protocol no longer needs. A replica starts checkpointing after executing a request with a sequence number divisible by some predefined constant, by multicasting the message -CHECKPOINT, v, n, d, i<sup>σ</sup>*<sup>i</sup>* to all other replicas. Here n is the sequence number of the last executed request and d is the digest of the state. Once a replica received f + 1 different checkpoint messages<sup>4</sup> (possibly including its own) for the same n and d, it holds a proof of correctness of the log corresponding to d, which includes messages up to sequence number n. The checkpoint is then called *stable* and all messages lower than n (except view-change messages) are pruned from the log.

**View Change.** The view change procedure ensures progress by allowing replicas to change the leader so as to not wait indefinitely for a faulty primary. Each backup starts a timer when it receives a request and stops it after the request has been executed. Expired timers cause the backup to suspect the leader and request a view change. It then stops receiving normal-case messages, and multicasts -VIEW-CHANGE, v + 1, n, s, C, P, i<sup>σ</sup>*<sup>i</sup>* , reporting the sequence number <sup>n</sup> of the last stable checkpoint s, its proof of correctness C, and the set of messages P with sequence numbers greater than n that backup i prepared since then. When the new primary p receives 2f +1 view-change messages, it multicasts -NEW-VIEW, v<sup>+</sup> 1, V, O, N<sup>σ</sup>*<sup>p</sup>* , where V is the set of 2f + 1 valid view-change messages that p received; O is the set of messages prepared since the latest checkpoint reported in V ; and N contains only the special *null* request for which the execution is a no-op. N is added to the O set to ensure that there are no gaps between the sequence numbers of prepared messages sent by the new primary. Upon receiving this new-view message, replicas enter view v + 1 and re-execute the normal-case protocol for all messages in O ∪ N.

We have proved a critical safety property of PBFT, including its garbage collection and view change procedures, which are essential in practical protocols. However, we have not yet developed generic abstractions to specifically reason about garbage collection and view changes, that can be reused in other protocols, which we leave as future work.

### **2.2 Properties**

PBFT with |R| = 3f +1 replicas is safe and live. Its safety boils down to linearizability [42], i.e., the replicated service behaves like a centralized implementation that executes operations atomically one at a time. Castro used a modified version of linearizability in [14] to deal with faulty clients. As presented in Sect. 5, we proved the crux of this property, namely the agreement property (we leave linearizability for future work).

As informally explained by Castro [14], assuming weak synchrony (which constrains message transmission delays), PBFT is live, i.e., clients will eventually receive replies to their requests. In the future, we plan to extend Velisarios to support liveness and mechanize PBFT's liveness proof.

<sup>4</sup> Castro first required 2f + 1 checkpoint messages [16] but relaxed this requirement in [14].

### **2.3 Differences with Castro's Implementation**

As mentioned above, besides the normal-case operation, our Coq implementation of PBFT handles garbage collection, view changes and request batching. However, we slightly deviated from Castro's implementation [14], primarily in the way checkpoints are handled: we always work around sending messages that are not between the water marks, and a replica always requires its own checkpoint before clearing its log. Assuming the reader is familiar with PBFT, we now detail these deviations and refer the reader to [14] for comparison.


We slightly deviated from Castro's protocol to make our proofs go through. We leave it for future work to formally study whether we could do without these changes, or whether they are due to shortcomings of the original specification.

### **3 Velisarios Model**

Using PBFT as a running example, we now present our Coq model for Byzantine fault-tolerant distributed systems, which relies on a logic of events—Fig. 2 outlines our formalization.

### **3.1 The Logic of Events**

We adapt the Logic of Events (LoE) we used in EventML [9,11,71] to not only deal with crash faults, but arbitrary faults in general (including malicious

**Fig. 2.** Outline of formalization

faults). LoE, related to Lamport's notion of causal order [53] and to event structures [60,65], was developed to reason about events occurring in the execution of a distributed system. LoE has recently been used to verify consensus protocols [71,73] and cyber-physical systems [3]. Another standard model of distributed computing is Chandy and Lamport's *global state semantics* [19], where a distributed system is modeled as a single state machine: a state is the collection of all processes at a given time, and a transition takes a message in flight and delivers it to its recipient (a process in the collection). Each of these two models has advantages and disadvantages over the other. We chose LoE because in our experience it corresponds more closely to the way distributed system researchers and developers reason about protocols. As such, it provides a convenient communication medium between distributed systems and verification experts.

In LoE, an event is an abstract entity that corresponds either (1) to the handling of a received message, or (2) to some arbitrary activity about which no information is provided (see the discussion about trigger in Sect. 3.4). We use those arbitrary events to model arbitrary/Byzantine faults. An event happens at a specific point in space/time: the space coordinate of an event is called its location, and the time coordinate is given by a well-founded ordering on events that totally orders all events at the same location. Processes react to the messages that triggered the events happening at their locations one at a time, by transitioning through their states and creating messages to send out, which in turn might trigger other events. In order to reason about distributed systems, we use the notion of *event orderings* (see Sect. 3.4), which essentially are collections of ordered events and represent runs of a system. They are abstract entities that are never instantiated. Rather, when proving a property about a distributed system, one has to prove that the property holds for all event orderings corresponding to all possible runs of the system (see Sects. 3.5 and 5 for examples). Some runs/event orderings are not possible and therefore excluded through assumptions, such as the ones described in Sect. 3.6. For example, exists at most f faulty excludes event orderings where more than f out of n nodes could be faulty.

In the next few sections, we explain the different components (messages, authentication, event orderings, state machines, and correct traces) of Velisarios, and their use in our PBFT case study. Those components are parameterized by abstract types (parameters include the type of messages and the kind of authentication schemes), which we later have to instantiate in order to reason about a given protocol, e.g. PBFT, and to obtain running code. The choices we made when designing Velisarios were driven by our goal to generate running code. For example, we model cryptographic primitives to reason about authentication.

### **3.2 Messages**

**Model.** Some events are caused by messages of type msg, which is a parameter of our model. Processes react to messages to produce message/destinations pairs (of type DirectedMsg), called *directed messages*. A directed message is typically handled by a message outbox, which sends the message to the listed destinations.<sup>5</sup> A destination is the name (of type name, which is a parameter of our model) of a node participating in the protocol.

**PBFT.** In our PBFT implementation, we instantiate the msg type using the following datatype (we only show some of the normal-case operation messages, leaving out for example the more involved pre-prepare messages—see Sect. 2.1):

```
Inductive PBFTmsg :=
                               Inductive Bare Prepare :=
                               | bare prepare (v : View) (n : SeqNum) (d : digest) (i : Rep).
                               Inductive Prepare :=
                               | prepare (b : Bare Prepare) (a : list Token).
```
As for prepares, all messages are defined as follows: we first define bare messages that do not contain authentication tokens (see Sect. 3.3), and then authenticated messages as pairs of a bare message and an authentication token. Views and sequence numbers are nats, while digests are parameters of the specification. PBFT involves two types of nodes: replicas of the form PBFTreplica(*r* ), where *r* is of type Rep; and clients of the form PBFTclient(*c*), where *c* is of type Client. Both Rep and Client are parameters of our formalization, such that Rep is of arity 3f+1, where f is a parameter that stands for the number of tolerated faults.

### **3.3 Authentication**

**Model.** Our model relies on an abstract concept of keys, which we use to implement and reason about authenticated communication. Capturing authenticity at the level of keys allows us to talk about impersonation through key leakage. Keys are divided into *sending keys* (of type sending key) to authenticate a message for a target node, and *receiving keys* (of type receiving key) to check the validity of a received message. Both sending key and receiving key are parameters of our model.<sup>6</sup> Each node maintains *local keys* (of type local keys), which consists of two lists of *directed keys*: one for sending keys and one for receiving keys. Directed keys are pairs of a key and a list of node names identifying the processes that the holder of the key can communicate with.

<sup>5</sup> Message inboxes/outboxes are part of the runtime environment but not part of the model.

<sup>6</sup> Sending and receiving keys must be different when using asymmetric cryptography, and can be the same when using symmetric cryptography.

Sending keys are used to create *authentication tokens* of type Token, which we use to authenticate messages. Tokens are parameters of our model and abstract away from concrete concepts such as digital signatures or MACs. Typically, a message consists of some data plus some tokens that authenticates the data. Therefore, we introduce the following parameters: (1) the type data, for the kind of data that can be authenticated; (2) a create function to authenticate some data by generating authentication tokens using the sending keys; and (3) a verify function to verify the authenticity of some data by checking that it corresponds to some token using the receiving keys.

Once some data has been authenticated, it is typically sent over the network to other nodes, which in turn need to check the authenticity of the data. Typically, when a process sends an authenticated message to another process it includes its identity somewhere in the message. This identity is used to select the corresponding receiving key to check the authenticity of the data using verify. To extract this claimed identity we require users to provide a data sender function.

It often happens in practice that a message contains more than one piece of authenticated data (e.g., in PBFT, pre-prepare messages contain authenticated client requests). Therefore, we require users to provide a get contained auth data function that extracts all authenticated pieces of data contained in a message. Because we sometimes want to use different tokens to authenticate some data (e.g., when using MACs), an authenticated piece of data of type auth data is defined as a pair of: (1) a piece of data, and (2) a list of tokens.

**PBFT.** Our PBFT implementation leaves keys and authentication tokens abstract because our safety proof is agnostic to the kinds of these elements. However, we turn them into actual asymmetric keys when extracting OCaml code (see Sect. 6 for more details). The create and verify functions are also left abstract until we extract the code to OCaml. Finally, we instantiate the data (the objects that can be authenticated, i.e., bare messages here), data sender, and get contained auth data parameters using:

```
Inductive PBFTdata := | PBFTdata request (r : Bare Request)
  | PBFTdata prepare (p : Bare Prepare) | PBFTdata reply (r : Bare Reply) ...
Definition PBFTdata sender (m : data) : option name := match m with
  | PBFTdata request (bare request otc) ⇒ Some (PBFTclient c)
  | PBFTdata prepare (bare prepare vndi) ⇒ Some (PBFTreplica i)
  | PBFTdata reply (bare reply vtcir) ⇒ Some (PBFTreplica i) ...
Definition PBFTget contained auth data (m : msg) : list auth data := match m with
  | REQUEST (request b a) ⇒ [(PBFTdata request b,a)]
  | PREPARE (prepare b a) ⇒ [(PBFTdata prepare b,a)]
  | REPLY (reply b a) ⇒ [(PBFTdata reply b,a)] . . .
```
### **3.4 Event Orderings**

A typical way to reason about a distributed system is to reason about its possible runs, which are sometimes modeled as execution traces [72], and which are captured in LoE using *event orderings*. An *event ordering* is an abstract representation of a run of a distributed system; it provides a formal definition of a *message sequence diagram* as used by system designers (see for example Fig. 1). As opposed to [72], a trace here is not just one sequence of events but instead can be seen as a collection of local traces (one local trace per sequential process), where a local trace is a collection of events all happening at the same location and ordered in time, and such that some events of different local traces are causally ordered. Event orderings are never instantiated. Instead, we express system properties as predicates on event orderings. A system satisfies such a property if every possible execution of the system satisfies the predicate. We first formally define the components of an event ordering, and then present the axioms that these components have to satisfy.

**Components.** An event ordering is formally defined as the tuple:<sup>7</sup>

## Class EventOrdering :=

{ Event : Type; happenedBefore : Event <sup>→</sup> Event <sup>→</sup> Prop; loc : Event → name; direct pred : Event → option Event; trigger : Event → option msg; keys : Event → local keys; }

where (1) Event is an abstract type of events; (2) happenedBefore is an ordering relation on events; (3) loc returns the location at which events happen; (4) direct pred returns the direct local predecessor of an event when one exists, i.e., for all events except initial events; (5) given an event *e*, trigger either returns the message that triggered *e*, or it returns None to indicate that no information is available regarding the action that triggered the event (see below); (6) keys returns the keys a node can use at a given event to communicate with other nodes. The event orderings presented here are similar to the ones used in [3,71], which we adapted to handle Byzantine faults by modifying the type of trigger so that events can be triggered by arbitrary actions and not necessarily by the receipt of a message, and by adding support for authentication through keys.

The trigger function returns None to capture the fact that nodes can sometimes behave arbitrarily. This includes processes behaving correctly, i.e., according to their specifications; as well as (possibly malicious) processes deviating from their specifications. Note that this does not preclude from capturing the behavior of correct processes because for all event orderings where trigger returns None for an event where the node behaved correctly, there is a similar event ordering, where trigger returns the triggering message at that event. To model that at most f nodes out of n can be faulty we use the exists at most f faulty assumption, which enforces that trigger returns None at most f nodes.

Moreover, even though non-syntactically valid messages do not trigger events because they are discarded by message boxes, a triggering message could be

<sup>7</sup> A Coq type class is essentially a dependent record.

syntactically valid, but have an invalid signature. Therefore, it is up to the programmer to ensure that processes only react to messages with valid signatures using the verify function. Our authenticated messages were sent non byz and exists at most f faulty assumptions presented in Sect. 3.6 are there to constrain trigger to ensure that at most f nodes out of n can diverge from their specifications, for example, by producing valid signatures even though they are not the nodes they claim to be (using leaked keys of other nodes).

**Axioms.** The following axioms characterize the behavior of these components:


**Notation.** We use *a* ≺ *b* to stand for (happenedBefore *a b*); *a b* to stand for (*a* ≺ *b* or *a*=*b*); and *a b* to stand for (*a b* and loc *a*=loc *b*). We also sometimes write EO instead of EventOrdering.

Some functions take an event ordering as a parameter. For readability, we sometimes omit those when they can be inferred from the context. Similarly, we will often omit type declarations of the form (*<sup>T</sup>* : Type).

**Correct Behavior.** To prove properties about distributed systems, one only reasons about processes that have a correct behavior. To do so we only reason about events in event orderings that are correct in the sense that they were triggered by some message:

```
Definition isCorrect (e : Event) := match trigger e with Some m ⇒ True | None ⇒ False end.
Definition arbitrary (e : Event) := ∼ isCorrect e.
```
Next, we characterize correct replica histories as follows: (1) First we say that an event *e* has a correct trace if all local events prior to *e* are correct. (2) Then, we say that a node *i* has a correct trace before some event *e*, not necessarily happening at *i*, if all events happening before *e* at *i* have a correct trace:

```
Definition has correct bounded trace (e : Event) := forall e', e'  e → isCorrect e'.
Definition has correct trace before (e : Event) (i : name) :=
  forall e', e'  e → loc e' = i → has correct bounded trace e'.
```
### **3.5 Computational Model**

**Model.** We now present our computational model, which we use when extracting OCaml programs. Unlike in EventML [71] where systems are first specified as *event observers* (abstract processes), and then later refined to executable code, we skip here event observers, and directly specify systems using executable state machines, which essentially consist of an update function and a current state. We define a system of distributed state machines as a function that maps names to state machines. Systems are parametrized by a function that associates state types with names in order to allow for different nodes to run different machines.

```
Definition Update SIO := S → I → (option S * O).
Record StateMachine SIO := MkSM { halted : bool; update : Update SIO; state : S }.
Definition System (F : name → Type) I O := forall (i : name), StateMachine (F i) I O.
```
where *S* is the type of the machine's state, *I* /*O* are the input/output types, and halted indicates whether the state machine is still running or not.

Let us now discuss how we relate state machines and events. We define state sm before event and state sm after event that compute a machine's state before and after a given event *e*. These states are computed by extracting the local history of events up to *e* using direct pred, and then updating the state machine by running it on the triggering messages of those events. These functions return None if some arbitrary event occurs or the machine halts sometime along the way. Otherwise they return Some *s*, where *s* is the state of the machine updated according to the events. Therefore, assuming they return Some amounts to assuming that all events prior to *e* are correct, i.e., we can prove that if state sm after event *sm e* = Some *s* then has correct trace before *e* (loc *e*). As illustrated below, we use these functions to adopt a Hoare-like reasoning style by stating pre/post-conditions on the state of a process prior and after some event.

**PBFT.** We implement PBFT replicas as state machines, which we derive from an update function that dispatches input messages to the corresponding handlers. Finally, we define PBFTsys as the function that associates PBFTsm with replicas and a halted machine with clients (because we do not reason here about clients).

```
Definition PBFTupdate (i : Rep) := fun state msg ⇒ match msg with
  | REQUEST r ⇒ PBFThandle request i state r
  | PREPARE p ⇒ PBFThandle prepare i state p ...
Definition PBFTsm (i : Rep) := MkSM false (PBFTupdate i) (initial state i).
Definition PBFTsys := fun name ⇒ match name with
  | PBFTreplica i ⇒ PBFTsm i | PBFTclient c ⇒ haltedSM end.
```
Let us illustrate how we reason about state machines through a simple example that shows that they maintain a view that only increases over time. It shows a local property, while Sect. 5 presents the distributed agreement property that makes use of the assumptions presented in Sect. 3.6. As mentioned above we prove such properties for all possible event orderings, which means that they are true for all possible runs of the system. In this lemma, *s1* is the state prior to the event *e*, and *s2* is the state after handling *e*. It does not have pre-conditions, and its post-condition states that the view in *s1* is smaller than the view in *s2* .

```
Lemma current view increases : forall (eo : EO) (e : Event) i s1 s2,
    state sm before event (PBFTsm i) e = Some s1
    → state sm after event (PPBFTsm i) e = Some s2
    → current view s1 ≤ current view s2.
```
### **3.6 Assumptions**

**Model.** Let us now turn to the assumptions we make regarding the network and the behavior of correct and faulty nodes.

**Assumption 1.** Proving safety properties of crash fault-tolerant protocols that only require reasoning about past events, such as agreement, does not require reasoning about faults and faulty replicas. To prove such properties, one merely has to follow the causal chains of events back in time, and if a message is received by a node then it must have been sent by some node that had not crashed at that time. The state of affairs is different when dealing with Byzantine faults.

One issue it that Byzantine nodes can deviate from their specifications or impersonate other nodes. However, BFT protocols are designed in such a way that nodes only react to collections of messages, called *certificates*, that are larger than the number of faults. This means that there is always at least one correct node that can be used to track down causal chains of events.

A second issue is that, in general, we cannot assume that some received message was sent as such by the designated (correct) sender of the message because messages can be manipulated while in flight. As captured by the authenticated messages were sent or byz predicate defined below,<sup>8</sup> we can only assume that the authenticated parts of the received message were actually sent by the designated senders, possibly inside larger messages, provided the senders did not leak their keys. As usual, we assume that attackers cannot break the cryptographic primitives, i.e., that they cannot authenticate messages without the proper keys [14].

```
1.Definition authenticated messages were sent or byz (P : AbsProcess) :=
```

```
2. forall e (a : auth data),
```

```
3. In a (bind op list get contained auth data (trigger e))
```
<sup>4.</sup> → verify auth data (loc *e*) *a* (keys *e*) = true

<sup>8</sup> For readability, we show a slightly simplified version of this axiom. The full axiom can be found in https://github.com/vrahli/Velisarios/blob/master/model/ EventOrdering.v.

```
5. → exists e', e' ≺ e ∧ am auth a = authenticate (am data a) (keys e' )
6. ∧ ( (exists dst m,
7. In a (get contained auth data m) ∧ In (m,dst) (P eo e' )
8. ∧ data sender (loc e) (am data a) = Some (loc e' ))
9. ∨
10. (exists e",
11. e"  e' ∧ arbitrary e' ∧ arbitrary e" ∧ got key for (loc e) (keys e") (keys e' )
12. ∧ data sender (loc e) (am data a) = Some (loc e")) ).
```
This assumption says that if the authenticated piece of data *a* is part of the message that triggered some event *e* (L.3), and *a* is verified (L.4), then there exists a prior event *e'* such that the data was authenticated while handling *e'* using the keys available at that time (L.5). Moreover, (1) either the sender of the data was correct while handling *e'* and sent the data as part of a message following the process described by *P* (L.6–8); or (2) the node at which *e'* occurred was Byzantine at that time, and either it generated the data itself (e.g. when *e"*=*e'*), or it impersonated some other replica (by obtaining the keys that some node leaked at event *e"*) (L.10–12).

We used a few undefined abstractions in this predicate: An AbsProcess is an abstraction of a process, i.e., a function that returns the collection of messages generated while handling a given event: (forall (*eo* : EO) (*e* : Event), list DirectedMsg). The bind op list function is wrapped around get contained auth data to handle the fact that trigger might return None, in which case bind op list returns nil. The verify auth data function takes an authenticated message *a* and some keys and: (1) invokes data sender (defined in Sect. 3.3) to extract the expected sender *s* of *a*; (2) searches among its keys for a receiving key that it can use to verify that *s* indeed authenticated *a*; and (3) finally verifies the authenticity of *a* using that key and the verify function. The authenticate function simply calls create and uses the sending keys to create tokens. The got key for function takes a name *i* and two local keys *lk1* and *lk2* , and states that the sending keys for *i* in *lk1* are all included in *lk2* .

However, it turns out that because we never reason about faulty nodes, we never have to deal with the right disjunct of the above formula. Therefore, this assumption about received messages can be greatly simplified when we know that the sender is a correct replica, which is always the case when we use this assumption because BFT protocols as designed so that there is always a correct node that can be used to track down causal chains of events. We now define the following simpler assumption, which we have proved to be a consequence of authenticated messages were sent or byz:

```
Definition authenticated messages were sent non byz (P : AbsProcess) :=
  forall (e : Event) (a : auth data) (c : name),
    In a (bind op list get contained auth data (trigger e))
    → has correct trace before e c
    → verify auth data (loc e) a (keys e) = true
    → data sender (loc e) (am data a) = Some c
    → exists e' dst m, e' ≺ e ∧ loc e' = c.
        ∧ am auth a = authenticate (am data a) (keys e' )
        ∧ In a (get contained auth data m)
        ∧ In (m,dst) (P eo e' )
```
As opposed to the previous formula, this one assumes that the authenticated data was sent by a correct replica, which has a correct trace prior to the event *e*—the event when the message containing *a* was handled.

**Assumption 2.** Because processes need to store their keys to sign and verify messages, we must connect those keys to the ones in the model. We do this through the correct keys assumption, which states that for each event *e*, if a process has a correct trace up to *e*, then the keys (keys *e*) from the model are the same as the ones stored in its state (which are computed using state sm before event).

**Assumption 3.** Finally, we present our assumption regarding the number of faulty nodes. There are several ways to state that there can be at most *f* faulty nodes. One simple definition is (where node is a subset of name as discussed in Sect. 4.2):

```
Definition exists at most f faulty (E : list Event) (f : nat) :=
  exists (faulty : list node), length faulty ≤ f
    ∧ forall e1 e2, In e2 E → e1  e2 → ∼ In (loc e1) faulty
           → has correct bounded trace e1.
```
This assumption says that at most *f* nodes can be faulty by stating that the events happening at nodes that are not in the list of faulty nodes *faulty*, of length *f* , are correct up to some point characterized by the partial cut *E* of a given event ordering (i.e., the collection of events happening before those in *E*).

**PBFT Assumption 4.** In addition to the ones above, we made further assumptions about PBFT. Replicas sometimes send message hashes instead of sending the entire messages. For example, pre-prepare messages contain client requests, but prepare and commit messages simply contain digests of client requests. Consequently, our PBFT formalization is parametrized by the following *create* and *verify* functions, and we assume that the create function is collision resistant:<sup>9</sup>

```
Class PBFThash := MkPBFThash {
  create hash : list PBFTmsg → digest; verify hash : list PBFTmsg → digest → bool; }.
Class PBFThash axioms := MkPBFThash axioms {
  create hash collision resistant :
    forall msgs1 msgs2, create hash msgs1 = create hash msgs2 → msgs1 = msgs2; }.
```
The version of PBFT, called PBFT-PK in [14], that we implemented relies on digital signatures. However, we did not have to make any more assumptions regarding the cryptographic primitives than the ones presented above, and in particular we did not assume anything that is true about digital signatures and false about MACs. Therefore, our safety proof works when using either digital signatures or MAC vectors. As discussed below, this is true because we adapted the way messages are verified (we have not verified the MAC version of PBFT but a slight variant of PBFT-PK) and because we do not deal with liveness.

<sup>9</sup> Note that our current collision resistant assumption is too strong because it is always possible to find two distinct messages that are hashed to the same hash. We leave it to future work to turn it into a more realistic probabilistic assumption.

As Castro showed [14, Chap. 3], PBFT-PK has to be adapted when digital signatures are replaced by MAC vectors. Among other things, it requires "significant and subtle changes to the view change protocol" [14, Sect. 3.2]. Also, to the best of our knowledge, in PBFT-PK backups do not check the authenticity of requests upon receipt of pre-prepares. They only check the authenticity of requests before executing them [14, p. 42]. This works when using digital signatures but not when using MACs: one backup might not execute the request because its part of the MAC vector does not check out, while another backup executes the request because its part of the MAC vector checks out, which would lead to inconsistent states and break safety. Castro lists other problems related to liveness.

Instead, as in the MAC version of PBFT [14, p. 42], in our implementation we always check requests' validity when checking the validity of a pre-prepare. If we were to check the validity of requests only before executing them, we would have to assume that two correct replicas would either both be able to verify the data, or both would not be able to do so. This assumption holds for digital signatures but not for MAC vectors.

### **4 Methodology**

Because distributed systems are all about exchanging information among nodes, we have developed a theory that captures abstractions and reasoning patterns to deal with knowledge dissemination (see Sect. 4.4). In the presence of faulty nodes, one has to ensure that this knowledge is reliable. Fault-tolerant state-machine replication protocols provide such guarantees by relying on certificates, which ensure that we can always get hold of a correct node to trace back information through the system. This requires reasoning about the past, i.e., reasoning by induction on causal time using the happenedBefore relation.

### **4.1 Automated Inductive Reasoning**

We use induction on causal time to prove both distributed and local properties. As discussed here, we automated the typical reasoning pattern we use to prove local properties. As an example, in our PBFT formalization, we proved the following local property: if a replica has a prepare message in its log, then it either received or generated it. Moreover, as for any kinds of programs, using Velisarios we prove local properties about processes by reasoning about all possible paths they can take when reacting upon messages. Thus, a typical proof of such a lemma using Velisarios goes as follows: (1) we go by induction on events; (2) we split the code of a process into all possible execution paths; (3) we prune the paths that could not happen because they invalidate some hypotheses of the lemma being proved; and (4) we automatically prove some other cases by induction hypothesis. We packaged this reasoning as a Coq tactic, which in practice can significantly reduce the number of cases to prove, and used this automation technique to prove local properties of PBFT, such as Castro's A.1.2 local invariants [14]. Because of PBFT's complexity, our Coq tactic typically reduces the number of cases to prove from between 50 to 60 cases down to around 7 cases, sometimes less, as we show in this histogram of goals left to interactively prove after automation:


### **4.2 Quorums**

As usual, we use quorum theory to trace back correct information between nodes. A (Byzantine) quorum w.r.t. a given set of nodes N, is a subset Q of N, such that f + 1 ≤ (2 ∗ |Q|) − |N| (where |X| is the size of X), i.e. every two quorums intersect [59,83] in sufficiently many replicas.<sup>10</sup> Typically, a quorum corresponds to a majority of nodes that agree on some property. In case of state machine replication, quorums are used to ensure that a majority of nodes agree to update the state using the same operation. If we know that two quorums intersect, then we know that both quorums agree, and therefore that the states cannot diverge. In order to reason about quorums, we have proved the following general lemma:<sup>11</sup>

Lemma overlapping quorums : forall (*l1 l2* : NRlist node), exists *Correct*, (length *l1* + length *l2*) - num nodes ≤ length *Correct* ∧ subset *Correct l1* ∧ subset *Correct l2* ∧ no repeats *Correct*.

This lemma implies that if we have two sets of nodes *l1* and *l2* (NRlist ensures that the sets have no repeats), such that the sum of their length is greater than the total number of nodes (num nodes), there must exist an overlapping subset of nodes (*Correct*). We use this result below in Sect. 4.4.

The node type parameter is the collection of nodes that can participate in quorums. For example, PBFT replicas can participate in quorums but clients cannot. This type comes with a node2name function to convert nodes into names.

### **4.3 Certificates**

Lemmas that require reasoning about several replicas are much more complex than local properties. They typically require reasoning about some information computed by a collection of replicas (such as quorums) that vouch for the information. In PBFT, a collection of 2f + 1 messages from different replicas is called

<sup>10</sup> We use here Castro's notation where quorums are *majority* quorums [79] (also called *write quorums*) that require intersections to be non-empty, as opposed to *read quorums* that are only required to intersect with write quorums [36].

<sup>11</sup> We present here a simplified version for readability.

a *strong (or quorum) certificate*, and a collection of f + 1 messages from different replicas is called a *weak certificate*.

When working with strong certificates, one typically reasons as follows: (1) Because PBFT requires 3f + 1 replicas, two certificates of size 2f + 1 always intersect in f + 1 replicas. (2) One message among those f + 1 messages must be from a correct replica because at most f replicas can be faulty. (3) This correct replica can vouch for the information of both quorums—we use that replica to trace back the corresponding information to the point in space/time where/when it was generated. We will get back to this in Sect. 4.4.

When working with weak certificates, one typically reasons as follows: Because, the certificate has size f + 1 and there are at most f faulty nodes, there must be one correct replica that can vouch for the information of the certificate.

### **4.4 Knowledge Theory**

**Model.** Let us now present an excerpt of our distributed epistemic knowledge library. Knowledge is a widely studied concept [10,30,31,37–39,70]. It is often captured using possible-worlds models, which rely on Kripke structures: an agent knows a fact if that fact is true in all possible worlds. For distributed systems, agents are nodes and a possible world at a given node is essentially one that has the same local history as the one of the current world, i.e., it captures the current state of the node. As Halpern stresses, e.g. in [37], such a definition of knowledge is *external* in the sense that it cannot necessarily be computed, though some work has been done towards deriving programs from knowledge-based specifications [10]. We follow a different, more pragmatic and computational approach, and say that a node knows some piece of data if it is stored locally, as opposed to the external and logical notion of knowing facts mentioned above. This computational notion of knowledge relies on exchanging messages to propagate it, which is what is required to derive programs from knowledge-based specifications (i.e., to compute that some knowledge is gained [20,37]).

We now extend the model presented in Sect. 3 with two epistemic modal operators *know* and *learn* that express what it means for a process to know and learn some information, and which bear some resemblance with the *fact discovery* and *fact publication* notions discussed in [38]. Formally, we extend our model with the following parameters, which can be instantiated as many times as needed for all the pieces of known/learned data that one wants to reason about—see below for examples:


The lak data type is the type of "raw" data that we have knowledge of; while lak info is some distinct information that might be shared by different pieces of data. For example, PBFT replicas collect batches of 2f + 1 (pre-)prepare messages from different replicas, that share the same view, sequence number, and digest. In that case, the (pre-)prepare messages are the raw data that contain the common information consisting of a view, a sequence number, and a digest. The lak memory type is the type of objects used to store one's knowledge, such as a state machine state. One has to provide a lak data2info function to extract the information embedded in some piece of data. The lak know predicate explains what it means to know some piece of data. The lak data2owner function extracts the "owner" of some piece of data, typically the node that generated the data. In order to authenticate pieces of data, the lak data2auth function extracts some piece of authenticated data from some piece of raw data. For convenience, we define the following wrapper around lak data2owner:

```
Definition lak data2node (d : lak data) : name := node2name (lak data2owner d).
```
Let us now turn to the two main components of our theory, namely the know and learn epistemic modal operators. These operators provide an abstraction barrier: they allow us to abstract away from *how* knowledge is stored and computed, in order to focus on the mere *fact* that we have that knowledge.

```
Definition know (sm : node → StateMachine lak memory) (e : Event) (d : lak data) :=
  exists mem i, loc e = node2name i
    ∧ state sm after event (sm i) e = Some mem
    ∧ lak know d mem.
```
where we simply write (StateMachine *S*) for a state machine with a state of type *S*, that takes messages as inputs, and outputs lists of directed messages. This states that the state machine (*sm i*) knows the data *d* at event *e* if its state is *mem* at *e* and (lak know *d mem*) is true. We define learn as follows:

```
Definition learn (e : Event) (d : lak data) :=
  exists i, loc e = node2name i
    ∧ In (lak data2auth d) (bind op list get contained auth data (trigger e))
    ∧ verify auth data (loc e) (lak data2auth d) (keys e) = true.
```
This states that a node learns *d* at some event *e*, if *e* was triggered by a message that contains the data *d*. Moreover, because we deal with Byzantine faults, we require that to learn some data one has to be able to verify its authenticity.

Next, we define a few predicates that are useful to track down knowledge. The first one is a local predicate that says that for a state machine to know about a piece of information it has to either have learned it or generated it.

```
Definition learn or know (sm : node → StateMachine lak memory) :=
  forall (d : lak data) (e : Event),
    know sm e d → (exists e', e'  e ∧ learn e' d) ∨ lak data2node d = loc e.
```
The next one is a distributed predicate that states that if one learns some piece of information that is owned by a correct node, then that correct node must have known that piece of information:

```
Definition learn if know (sm : node → StateMachine lak memory) :=
  forall (d : lak data) (e : Event),
    (learn e d ∧ has correct trace before e (lak data2node d))
    → exists e', e' ≺ e ∧ loc e' = lak data2node d ∧ know sm e' d.
```
Using these two predicates, we have proved this general lemma about knowledge propagating through nodes:

```
Lemma know propagates :
  forall (e : Event) (sm : node → StateMachine lak memory) (d : lak data),
    (learn or know sm ∧ learn if know sm)
    → (know sm e d ∧ has correct trace before e (lak data2node d))
    → exists e', e'  e ∧ loc e' = lak data2node d ∧ know sm e' d.
```
This lemma says that, assuming learn or know and learn if know, if one knows at some event *e* some data *d* that is owned by a correct node, then that correct node must have known that data at a prior event *e'*. We use this lemma to track down information through correct nodes.

As mentioned in Sect. 4.3, when reasoning about distributed systems, one often needs to reason about certificates, i.e., about collections of messages from different sources. In order to capture this, we introduce the following know certificate predicate, which says that the state machine *sm* knows the information *i* at event *e* if there exists a list *l* of pieces of data of length at least *k* (the certificate size) that come from different sources, and such that *sm* knows each of these pieces of data, and each piece of data carries the common information *nfo*:

```
Definition know certificate (sm : node → StateMachine lak memory)
           (e : Event) (k : nat) (nfo : lak info) (P : list lak data → Prop) :=
  exists (l : list lak data),
    k ≤ length l ∧ no repeats (map lak data2owner l) ∧ P l
    ∧ forall d, In d l → (know sm e d ∧ nfo = lak data2info d).
```
Using this predicate, we can then combine the quorum and knowledge theories to prove the following lemma, which captures the fact that if there are two quorums for information *nfo1* (known at *e1* ) and *nfo2* (known at *e2* ), and the intersection of the two quorums is guaranteed to contain a correct node, then there must be a correct node (at which *e1'* and *e2'* happen) that owns and knows both *nfo1* and *nfo2*—this lemma follows from know propagates and overlapping quorums:

```
Lemma know in intersection :
  forall (sm : node → StateMachine lak memory) (e1 e2 : Event) (nfo1 nfo2 : lak info)
         (k f : nat) (P : list lak data → Prop) (E : list Event),
    (learn or know sm ∧ learn if know sm)
    → (k ≤ num nodes ∧ num nodes + f < 2 * k)
    → (exists at most f faulty E f ∧ In e1 E ∧ In e2 E)
    → (know certificate sm e1 k nfo1 P ∧ know certificate sm e2 k nfo2 P)
    → exists e1' e2' d1 d2, loc e1' = loc e2' ∧ e1'  e1 ∧ e2'  e2
        ∧ loc e1' = lak data2node d1 ∧ loc e2' = lak data2node d2
        ∧ know sm e1' d1 ∧ know sm e2' d2
        ∧ i1 = lak data2info d1 ∧ i2 = lak data2info d2.
```
Similarly, we proved the following lemma, which captures the fact that there is always a correct replica that can vouch for the information of a weak certificate:

```
Lemma know weak certificate :
```

```
forall (e : Event) (k f : nat) (nfo : lak info) (P : list lak data → Prop) (E : list Event),
 (f < k ∧ exists at most f faulty E f ∧ In e E ∧ know certificate e k nfo P)
 → exists d, has correct trace before e (node2node d) ∧ know e d ∧ nfo = lak data2info d.
```
**PBFT.** One of the key lemmas to prove PBFT's safety says that if two correct replicas have prepared some requests with the same sequence and view numbers, then the requests must be the same [14, Inv.A.1.4]. As mentioned in Sect. 2.1, a replica has prepared a request if it received pre-prepare and prepare messages from a quorum of replicas. To prove this lemma, we instantiated LearnAndKnow as follows: lak data can either be a pre-prepare or a prepare message; lak info is the type of triples view/sequence number/digest; lak memory is the type of states maintained by replicas; lak data2info extracts the view, sequence number and digest contained in pre-prepare and prepare messages; lak know states that the pre-prepare or prepare message is stored in the state; lak data2owner extracts the sender of the message; and lak data2auth is similar to the PBFTget contained auth data function presented in Sect. 3.6. The two predicates learn or know and learn if know, which we proved using the tactic discussed in Sect. 4.1, are true about this instance of LearnAndKnow. Inv.A.1.4 is then a straightforward consequence of know in intersection applied to the two quorums.

### **5 Verification of PBFT**

*Agreement.* Velisarios is designed as a general, reusable, and extensible framework that can be instantiated to prove the correctness of any BFT protocol. We demonstrated its usability by proving that our PBFT implementation satisfies the standard agreement property, which is the crux of linearizability (we leave linearizability for future work—see Sect. 2.2 for a high-level definition). Agreement states that, regardless of the view, any two replies sent by correct replicas *i1* and *i2* at events *e1* and *e2* for the same timestamp *ts* to the same client *c* contain the same replies. We proved that this is true in any event ordering that satisfies the assumptions from Sect. 3.6: 12

```
Lemma agreement :
  forall (eo : EventOrdering) (e1 e2 : Event) (v1 v2 : View) (ts : Timestamp)
         (c : Client) (i1 i2 : Rep) (r1 r2 : Request) (a1 a2 : list Token),
  authenticated messages were sent or byz sys eo PBFTsys ∧ correct keys eo
  → (exists at most f faulty [e1,e2] f ∧ loc e1 = PBFTreplica i1 ∧ loc e2 = PBFTreplica i2)
  → In (send reply v1 ts c i1 r1 a1) (output system on event PBFTsys e1)
  → In (send reply v2 ts c i2 r2 a2) (output system on event PBFTsys e2)
  → r1 = r2.
```
<sup>12</sup> See agreement in https://github.com/vrahli/Velisarios/blob/master/PBFT/ PBFTagreement.v.

where Timestamps are nats; authenticated messages were sent or byz sys is defined on systems using authenticated messages were sent or byz; the function output system on event is similar to state sm after event (see Sect. 3.5) but returns the outputs of a given state machine at a given event instead of returning its state; and send reply builds a reply message. To prove this lemma, we proved most of the invariants stated by Castro in [14, Appendix A]. In addition, we proved that if the last executed sequence number of two correct replicas is the same, then these two replicas have, among other things, the same service state.<sup>13</sup>

As mentioned above, because our model is based on LoE, we only ever prove such properties by induction on causal time. Similarly, Castro proved most of his invariants by induction on the length of the executions. However, he used other induction principles to prove some lemmas, such as Inv.A.1.9, which he proved by induction on views [14, p. 151]. This invariant says that prepared requests have to be consistent with the requests sent in pre-prepare messages by the primary. A straightforward induction on causal time was more natural in our setting.

Castro used a simulation method to prove PBFT's safety: he first proved the safety of a version without garbage collection and then proved that the version with garbage collection implements the one without. This requires defining two versions of the protocol. Instead, we directly prove the safety of the one with garbage collection. This involved proving further invariants about stored, received and sent messages, essentially that they are always within the water marks.

*Proof Effort.* In terms of proof effort, developing Velisarios and verifying PBFT's agreement property took us around 1 person year. Our generic Velisarios framework consists of around 4000 lines of specifications and around 4000 lines of proofs. Our verified implementation of PBFT consists of around 20000 lines of specifications and around 22000 lines of proofs.

### **6 Extraction and Evaluation**

*Extraction.* To evaluate our PBFT implementation (i.e., PBFTsys defined in Sect. 3.5—a collection of state machines), we generate OCaml code using Coq's extraction mechanism. Most parameters, such as the number of tolerated faults, are instantiated before extraction. Note that not all parameters need to be instantiated. For example, as mentioned in Sect. 3.1, neither do we instantiate event orderings, nor do we instantiate our assumptions (such as exists at most f faulty), because they are not used in the code but are only used to prove that properties are true about all possible runs. Also, keys, signatures, and digests are only instantiated by stubs in Coq. We replace those stubs when extracting OCaml code by implementations provided by the nocrypto [66] library, which is the cryptographic library we use to hash, sign, and verify messages (we use RSA).

<sup>13</sup> See same states if same next to execute in https://github.com/vrahli/Velisarios/ blob/master/PBFT/PBFTsame states.v.

*Evaluation.* To run the extracted code in a real distributed environment, we implemented a small trusted runtime environment in OCaml that uses the Async library [5] to handle sender/receiver threads. We show among other things here that the average latency of our implementation is acceptable compared to the state of the art BFT-SMaRt [8] library. Note that because we do not offer a new protocol, but essentially a re-implementation of PBFT, we expect that on average the scale will be similar in other execution scenarios such as the ones studied by Castro in [14]. We ran our experiments using desktops with 16 GB of memory, and 8 i7-6700 cores running at 3.40 GHz. We report some of our experiments where we used a single client, and a simple state machine where the state is a number, and an operation is either adding or subtracting some value.

We ran a local simulation to measure the performance of our PBFT implementation without network and signatures: when 1 client sends 1 million requests, it takes on average 27.6µs for the client to receive f +1 (f = 1) replies.

**Fig. 3.** (1) Single machine (top/left); (2) several machines (top/right); (3) single machine using MACs (bottom/left); (4) view change response time (bottom/right)

Top/left of Fig. 3 shows the experiment where we varied f from 1 to 3, and replicas sent messages, signed using RSA, through sockets, but on a single machine. As mentioned above, we implemented the digital signature-based version of PBFT, while BFT-SMaRt uses a more efficient MAC-based authentication scheme, which in part explains why BFT-SMaRt is around one order of magnitude faster than our implementation. As in [14, Table 8.9], we expect a similar improvement when using the more involved, and as of yet not formally verified, MAC-based version of PBFT (bottom/left of Fig. 3 shows the average response time when replacing digital signatures by MACs, without adapting the rest of the protocol). Top/right of Fig. 3 presents results when running our version of PBFT and BFT-SMaRt on several machines, for f = 1. Finally, bottom/right of Fig. 3 shows the response time of our view-change protocol. In this experiment, we killed the primary after 16 s of execution, and it took around 7 s for the system to recover.

*Trusted Computing Base.* The TCB of our system includes: (1) the fact that our LoE model faithfully reflects the behavior of distributed systems (see Sect. 3.4); (2) the validity of our assumptions: authenticated messages were sent or byz; exists at most f faulty; correct keys; and create hash collision resistant (Sect. 3.6); (3) Coq's logic and implementation; (4) OCaml and the nocrypto and Async libraries we use in our runtime environment, and the runtime environment itself (Sect. 6); (5) the hardware and software on which our framework is running.

### **7 Related Work**

Our framework is not the first one for implementing and reasoning about the correctness of distributed systems (see Fig. 4). However, to the best of our knowledge, (1) it is the first theorem prover based tool for verifying the correctness of asynchronous Byzantine fault-tolerant protocols and their implementations; and (2) we provide the first mechanical proof of the safety of a PBFT implementation. Velisarios has evolved from our earlier EventML framework [71], primarily to reason about Byzantine faults and distributed epistemic knowledge.


**Fig. 4.** Comparison with related work

### **7.1 Logics and Models**

**IOA** [33–35,78] is the model used by Castro [14] to prove PBFT's safety. It is a programming/specification language for describing asynchronous distributed systems as I/O automata [58] (labeled state transition systems) and stating their properties. While IOA is state-based, the logic we use in this paper is event-based. IOA can interact with a large range of tools such as type checkers, simulators, model checkers, theorem provers, and there is support for synthesis of Java code [78]. In contrast, our methodology allows us to both implement and verify protocols within the same tool, namely Coq.

**TLA**<sup>+</sup> [24,51] is a language for specifying and reasoning about systems. It combines: (1) TLA [52], which is a temporal logic for describing systems [51], and (2) set theory, to specify data structures. TLAPS [24] uses a collection of theorem provers, proof assistants, SMT solvers, and decision procedures to mechanically check TLA proofs. Model checker integration helps catch errors before verification attempts. TLA<sup>+</sup> has been used in a large number of projects (e.g., [12,18,44,56,63,64]) including proofs of safety and liveness of Multi-Paxos [18], and safety of a variant of an abstract model of PBFT [13]. To the best of our knowledge, TLA<sup>+</sup> does not perform program synthesis.

**The Heard-Of (HO) Model** [23] requires processes to execute in lock-step through rounds into which the distributed algorithms are divided. Asynchronous fault-tolerant systems are treated as synchronous systems with adversarial environments that cause messages to be dropped. The HO-model was implemented in Isabelle/HOL [22] and used, for example, to verify the EIGByz [7] Byzantine agreement algorithm for synchronous systems with reliable links. This formalization uses the notion of *global state of the system* [19], while our approach relies on Lamport's *happened before* relation [53], which does not require reasoning about a distributed system as a single entity (a global state). Model checking and the HO-model were also used in [21,80,81] for verifying the crash fault-tolerant consensus algorithms presented in [23]. To the best of our knowledge, there is no tool that allows generating code from algorithms specified using the HO-model.

**Event-B** [1] is a set-theory-based language for modeling reactive systems and for *refining* high-level abstract specifications into low-level concrete ones. It supports code generation [32,61], with some limitations (not all features are covered). The Rodin [2] platform for Event-B provides support for refinement, and automated and interactive theorem proving. Both have been used in a number of projects, such as: to prove the safety and liveness of self- systems [4]; to prove the agreement and validity properties of the synchronous crash-tolerant Floodset consensus algorithm [57]; and to prove the agreement and validity of synchronous Byzantine agreement algorithms [50]. In [50], the authors assume that messages cannot be forged (using PBFT, at most f nodes can forge messages), and do not verify implementations of these algorithms.

### **7.2 Tools**

**Verdi** [85,86] is a framework to develop and reason about distributed systems using Coq. As in our framework, Verdi leaves no gaps between verified and running code. Instead, OCaml code is extracted directly from the verified Coq implementation. Verdi provides a compositional way of specifying distributed systems. This is done by applying *verified system transformers*. For example, Raft [67]—an alternative to Paxos—transforms a distributed system into a crashtolerant one. One difference between our respective methods is that they verify a system by reasoning about the evolution of its global state, while we use Lamport's happened before relation. Moreover, they do not deal with the full spectrum of arbitrary faults (e.g., malicious faults).

**Disel** [75,84] is a verification framework that implements a separation-style program logic, and that enables compositional verification of distributed systems.

**IronFleet** [40,41] is a framework for building and reasoning about distributed systems using Dafny [55] and the Z3 theorem prover [62]. Because systems are both implemented in and verified using Dafny, IronFleet also prevents gaps between running and verified code. It uses a combination of TLA-style statemachine refinements [51] to reason about the distributed aspects of protocols, and Floyd-Hoare-style imperative verification techniques to reason about local behavior. The authors have implemented, among other things, the Paxos-based state machine replication library IronRSL, and verified its safety and liveness.

**PSync** [28] is a domain specific language embedded in Scala, that enables executing and verifying fault-tolerant distributed algorithms in synchronous and partially asynchronous networks. PSync is based on the HO-model, and has been used to implement several crash fault-tolerant algorithms. Similar to the Verdi framework, PSync makes use of a notion of global state and supports reasoning based on the multi-sorted first-order *Consensus verification logic* (CL) [27]. To prove safety, users have to provide invariants, which CL checks for validity. Unlike Verdi, IronFleet and PSync, we focus on Byzantine faults.

**ByMC** is a model checker for verifying safety and liveness of fault-tolerant distributed algorithms [47–49]. It applies an automated method for model checking parametrized threshold-guarded distributed algorithms (e.g., processes waiting for messages from a majority of distinct senders). ByMC is based on a short counter-example property, which says that if a distributed algorithm violates a temporal specification then there is a counterexample whose length is bounded and independent of the parameters (e.g. the number of tolerated faults).

**Ivy** [69] allows debugging infinite-state systems using bounded verification, and formally verifying their safety by gradually building universally quantified inductive invariants. To the best of our knowledge, Ivy does not support faults.

**Actor Services** [77] allows verifying the distributed and functional properties of programs communicating via asynchronous message passing at the level of the source code (they use a simple Java-like language). It supports modular reasoning and proving liveness. To the best of our knowledge, it does not deal with faults.

**PVS** has been extensively used for verification of synchronous systems that tolerate malicious faults such as in [74], to the extent that its design was influenced by these verification efforts [68].

### **8 Conclusions and Future Work**

We introduced Velisarios, a framework to implement and reason about BFT-SMR protocols using the Coq theorem prover, and described a methodology based on learn/know epistemic modal operators. We used this framework to prove the safety of a complex system, namely Castro's PBFT protocol. In the future, we plan to also tackle liveness/timeliness. Indeed, proving the safety of a distributed system is far from being enough: a protocol that does not run (which is not live) is useless. Following the same line of reasoning, we want to tackle timeliness because, for real world systems, it is not enough to prove that a system will *eventually reply*. One often desires that the system replies in a timely fashion.

### **References**


650 V. Rahli et al.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Program Analysis and Automated Verification

## **Evaluating Design Tradeoffs in Numeric Static Analysis for Java**

Shiyi Wei1(B) , Piotr Mardziel<sup>2</sup>, Andrew Ruef<sup>3</sup>, Jeffrey S. Foster<sup>3</sup>, and Michael Hicks<sup>3</sup>

> <sup>1</sup> The University of Texas at Dallas, Richardson, USA swei@utdallas.edu <sup>2</sup> Carnegie Mellon University, Moffett Field, USA piotrm@gmail.com <sup>3</sup> University of Maryland, College Park, USA *{*awruef,jfoster,mwh*}*@cs.umd.edu

**Abstract.** Numeric static analysis for Java has a broad range of potentially useful applications, including array bounds checking and resource usage estimation. However, designing a scalable numeric static analysis for real-world Java programs presents a multitude of design choices, each of which may interact with others. For example, an analysis could handle method calls via either a top-down or bottom-up interprocedural analysis. Moreover, this choice could interact with how we choose to represent aliasing in the heap and/or whether we use a relational numeric domain, e.g., convex polyhedra. In this paper, we present a family of abstract interpretation-based numeric static analyses for Java and systematically evaluate the impact of 162 analysis configurations on the DaCapo benchmark suite. Our experiment considered the precision and performance of the analyses for discharging array bounds checks. We found that top-down analysis is generally a better choice than bottom-up analysis, and that using access paths to describe heap objects is better than using summary objects corresponding to pointsto analysis locations. Moreover, these two choices are the most significant, while choices about the numeric domain, representation of abstract objects, and context-sensitivity make much less difference to the precision/performance tradeoff.

### **1 Introduction**

Static analysis of numeric program properties has a broad range of useful applications. Such analyses can potentially detect array bounds errors [50], analyze a program's resource usage [28,30], detect side channels [8,11], and discover vectors for denial of service attacks [10,26].

One of the major approaches to numeric static analysis is abstract interpretation [18], in which program statements are evaluated over an abstract domain until a fixed point is reached. Indeed, the first paper on abstract interpretation [18] used numeric intervals as one example abstract domain, and many subsequent researchers have explored abstract interpretation-based numeric static analysis [13,22–25,31].

Despite this long history, applying abstract interpretation to real-world Java programs remains a challenge. Such programs are large, have many interacting methods, and make heavy use of heap-allocated objects. In considering how to build an analysis that aims to be sound but also precise, prior work has explored some of these challenges, but not all of them together. For example, several works have considered the impact of the choice of numeric domain (e.g., intervals vs. convex polyhedra) in trading off precision for performance but not considered other tradeoffs [24,38]. Other works have considered how to integrate a numeric domain with analysis of the heap, but unsoundly model method calls [25] and/or focus on very precise properties that do not scale beyond small programs [23,24]. Some scalability can be recovered by using programmer-specified pre- and postconditions [22]. In all of these cases, there is a lack of consideration of the broader design space in which many implementation choices interact. (Sect. 7 considers prior work in detail.)

In this paper, we describe and then systematically explore a large design space of fully automated, abstract interpretation-based numeric static analyses for Java. Each analysis is identified by a choice of five configurable options—the numeric domain, the heap abstraction, the object representation, the interprocedural analysis order, and the level of context sensitivity. In total, we study 162 analysis configurations to asses both how individual configuration options perform overall and to study interactions between different options. To our knowledge, our basic analysis is one of the few fully automated numeric static analyses for Java, and we do not know of any prior work that has studied such a large static analysis design space.

We selected analysis configuration options that are well-known in the static analysis literature and that are key choices in designing a Java static analysis. For the numeric domain, we considered both intervals [17] and convex polyhedra [19], as these are popular and bookend the precision/performance spectrum. (See Sect. 2.)

Modeling the flow of data through the heap requires handling pointers and aliasing. We consider three different choices of *heap abstraction*: using *summary objects* [25,27], which are *weakly updated*, to summarize multiple heap locations; *access paths* [21,52], which are *strongly updated*; and a combination of the two.

To implement these abstractions, we use an ahead-of-time, global *pointsto analysis* [44], which maps static/local variables and heap-allocated fields to abstract objects. We explore three variants of *abstract object representation*: the standard *allocation-site abstraction* (the most precise) in which each syntactic new in the program represents an abstract object; *class-based abstraction* (the least precise) in which each class represents all instances of that class; and a *smushed string abstraction* (intermediate precision) which is the same as allocation-site abstraction except strings are modeled using a class-based abstraction [9]. (See Sect. 3.)

We compare three choices in the *interprocedural analysis order* we use to model method calls: *top-down analysis*, which starts with main and analyzes callees as they are encountered; and *bottom-up analysis*, which starts at the leaves of the call tree and instantiates method summaries at call sites; and a hybrid analysis that is bottom-up for library methods and top-down for application code. In general, top-down analysis explores fewer methods, but it may analyze callees multiple times. Bottom-up analysis explores each method once but needs to create summaries, which can be expensive.

Finally, we compare three kinds of *context-sensitivity* in the points-to analysis: *context-insensitive* analysis, *1-CFA analysis* [46] in which one level of calling context is used to discriminate pointers, and *type-sensitive analysis* [49] in which the type of the receiver is the context. (See Sect. 4.)

We implemented our analysis using WALA [2] for its intermediate representation and points-to analyses and either APRON [33,41] or ELINA [47,48] for the interval or polyhedral, respectively, numeric domain. We then applied all 162 analysis configurations to the DaCapo benchmark suite [6], using the numeric analysis to try to prove array accesses are within bounds. We measured the analyses' performance and the number of array bounds checks they discharged. We analyzed our results by using a multiple linear regression over analysis features and outcomes, and by performing data visualizations.

We studied three research questions. First, we examined how analysis configuration affects performance. We found that using summary objects causes significant slowdowns, e.g., the vast majority of the analysis runs that timed out used summary objects. We also found that polyhedral analysis incurs a significant slowdown, but only half as much as summary objects. Surprisingly, bottom-up analysis provided little performance advantage generally, though it did provide some benefit for particular object representations. Finally, context-insensitive analysis is faster than context-sensitive analysis, as might be expected, but the difference is not great when combined with more approximate (class-based and smushed string) abstract object representations.

Second, we examined how analysis configuration affects precision. We found that using access paths is critical to precision. We also found that the bottomup analysis has worse precision than top-down analysis, especially when using summary objects, and that using a more precise abstract object representation improves precision. But other traditional ways of improving precision do so only slightly (the polyhedral domain) or not significantly (context-sensitivity).

Finally, we looked at the precision/performance tradeoff for all programs. We found that using access paths is always a good idea, both for precision and performance, and top-down analysis works better than bottom-up. While summary objects, originally proposed by Fu [25], do help precision for some programs, the benefits are often marginal when considered as a percentage of all checks, so they tend not to outweigh their large performance disadvantage. Lastly, we found that the precision gains for more precise object representations and polyhedra are modest, and performance costs can be magnified by other analysis features.


**Table 1.** Analysis configuration options, and their possible settings.

In summary, our empirical study provides a large, comprehensive evaluation of the effects of important numeric static analysis design choices on performance, precision, and their tradeoff; it is the first of its kind. Our code and data is available at https://github.com/plum-umd/JANA.

### **2 Numeric Static Analysis**

A *numeric static analysis* is one that tracks numeric properties of memory locations, e.g., that *x* - 5 or *y>z*. A natural starting point for a numeric static analysis for Java programs is numeric abstract interpretation over program variables within a single procedure/method [18].

A standard abstract interpretation expresses numeric properties using a *numeric abstract domain*, of which the most common are *intervals* (also known as boxes) and *convex polyhedra*. Intervals [17] define abstract states using inequalities of the form *p relop n* where *p* is a variable, *n* is a constant integer, and *relop* is a relational operator such as -. A variable such as *p* is sometimes called a *dimension*, as it describes one axis of a numeric space. Convex polyhedra [19] define abstract states using linear relationships between variables and constants, e.g., of the form 3*p*<sup>1</sup> − *p*<sup>2</sup> - 5. Intervals are less precise but more efficient than polyhedra. Operation on intervals have time complexity linear in the number of dimensions whereas the time complexity for polyhedra operations is exponential in the number of dimensions.<sup>1</sup>

<sup>1</sup> Further, the time complexity of join is *<sup>O</sup>*(*<sup>d</sup> · <sup>c</sup>*<sup>2</sup>*d*+1 ) where *<sup>c</sup>* is the number of constraints, and *d* is the number of dimensions [47].

Numeric abstract interpretation, including our own analyses, are usually flowsensitive, i.e., each program point has an associated abstract state characterizing properties that hold at that point. Variable assignments are *strong updates*, meaning information about the variable is replaced by information from the right-hand side of the assignment. At merge points (e.g., after the completion of a conditional), the abstract states of the possible prior states are *joined* to yield properties that hold regardless of the branch taken. Loop bodies are reanalyzed until their constituent statements' abstract states reach a fixed point. Reaching a fixed point is accelerated by applying the numeric domain's standard *widening* operator [4] in place of join after a fixed number of iterations.

Scaling a basic numeric abstract interpreter to full Java requires making many design choices. Table 1 summarizes the key choices we study in this paper. Each configuration option has a range of settings that potentially offer different precision/performance tradeoffs. Different options may interact with each other to affect the tradeoff. In total, we study five options with two or three settings each. We have already discussed the first option, the numeric domain (ND), for which we consider intervals (INT) and polyhedra (POL). The next two options consider the heap, and are discussed in the next section, and the last two options consider method calls, and are discussed in Sect. 4.

For space reasons, our paper presentation focuses on the high-level design and tradeoffs. Detailed algorithms are given formally in the technical report [51] for the heap and interprocedural analysis.

### **3 The Heap**

The numeric analysis described so far is sufficient only for analyzing code with local, numeric variables. To analyze numeric properties of heap-manipulating programs, we must also consider heap locations *x.f*, where *x* is a reference to a heap-allocated object, and *f* is a numeric field.<sup>2</sup> To do so requires developing a *heap abstraction* (HA) that accounts for aliasing. In particular, when variables *x* and *y* may point to the same heap object, an assignment to *x.f* could affect *y.f*. Moreover, the referent of a pointer may be uncertain, e.g., the true branch of a conditional could assign location *o*<sup>1</sup> to *x*, while the false branch could assign *o*<sup>2</sup> to *x*. This uncertainty must be reflected in subsequent reads of *x.f*.

We use a *points-to analysis* to reason about aliasing. A points-to analysis computes a mapping *Pt* from variables *x* and access paths *x.f* to (one or more) *abstract objects* [44]. If *Pt* maps two variables/paths *p*<sup>1</sup> and *p*<sup>2</sup> to a common abstract object *o* then *p*<sup>1</sup> and *p*<sup>2</sup> *may alias*. We also use points-to analysis to determine the call graph, i.e., to determine what method may be called by an expression *x.m*(*...*) (discussed in Sect. 4).

<sup>2</sup> In our implementation, statements such as *z* = *x.f.g* are decomposed so that paths are at most length one, e.g., *w* = *x.f*; *z* = *w.g*.

#### **3.1 Summary Objects (SO)**

The first heap abstraction we study is based on Fu [25]: use a *summary object* (SO) to abstract information about multiple heap locations as a single abstract state "variable" [27]. As an example, suppose that *Pt*(*x*) = {*o*} and we encounter the assignment *x.f* := 5. Then in this approach, we add a variable *<sup>o</sup> <sup>f</sup>* to the abstract state, modeling the field *f* of object *o*, and we add constraint *o f* = *n*. Subsequent assignments to such summary objects must be *weak updates*, to respect the *may alias* semantics of the points-to analysis. For example, suppose *y.f* may alias *x.f*, i.e., *<sup>o</sup>* <sup>∈</sup> *Pt*(*x*)∩*Pt*(*y*). Then after a later assignment *y.f* := <sup>7</sup> the analysis would weakly update *o f* with 7, producing constraints 5 *o f* - 7 in the abstract state. These constraints conservatively model that either *o f* = 5 or *o f* = 7, since the assignment to *y.f* may or may not affect *x.f*.

In general, weak updates are more expensive than strong updates, and reading a summary object is more expensive than reading a variable. A strong update to *x* is implemented by *forgetting x* in the abstract state,<sup>3</sup> and then re-adding it to be equal to the assigned-to value. Note that *x* cannot appear in the assigned-to value because programs are converted into static single assignment form (Sect. 5). A weak update—which is not directly supported in the numeric domain libraries we use—is implemented by copying the abstract state, strongly updating *x* in the copy, and then joining the two abstract states. Reading from a summary object requires "expanding" the abstract state with a copy *o f* of the summary object and its constraints, creating a constraint on *o f*, and then forgetting *o f*. Doing this ensures that operations on a variable into which a summary object is read do not affect prior reads. A normal read just references the read variable.

Fu [25] argues that this basic approach is better than ignoring heap locations entirely by measuring how often field reads are not unconstrained, as would be the case for a heap-unaware analysis. However, it is unclear whether the approach is sufficiently precise for applications such as array-bounds check elimination. Using the polyhedra numeric domain should help. For example, a Buffer class might store an array in one field and a conservative bound on an array's length in another. The polyhedral domain will permit relating the latter to the former while the interval domain will not. But the slowdown due to the many added summary objects may be prohibitive.

#### **3.2 Access Paths (AP)**

An alternative heap abstraction we study is to treat *access paths* (AP) as if they are normal variables, while still accounting for possible aliasing [21,52]. In particular, a path *x.f* is modeled as a variable *<sup>x</sup> <sup>f</sup>*, and an assignment *x.f* := *<sup>n</sup>* strongly updates *x f* to be *n*. At the same time, if there exists another path *y.f* and *x* and *y* may alias, then we must weakly update *y f* as possibly containing *n*. In general, determining which paths must be weakly updated depends on the abstract object representation and context-sensitivity of the points-to analysis.

<sup>3</sup> Doing so has the effect of "connecting" constraints that are transitive via *x*. For example, given *y x* - 5, forgetting *x* would yield constraint *y* -5.

Two key benefits of AP over SO are that (1) AP supports strong updates to paths *x.f*, which are more precise and less expensive than weak updates, and (2) AP may require fewer variables to be tracked, since, in our design, access paths are mostly local to a method whereas points-to sets are computed across the entire program. On the other hand, SO can do better at summarizing invariants about heap locations pointed to by other heap locations, i.e., not necessarily via an access path. Especially when performing an interprocedural analysis, such information can add useful precision.

**Combined (AP+SO).** A natural third choice is to combine AP and SO. Doing so sums both the costs and benefits of the two approaches. An assignment *x.f* := *<sup>n</sup>* strongly updates *<sup>x</sup> <sup>f</sup>* and weakly updates *<sup>o</sup> <sup>f</sup>* for each *<sup>o</sup>* in *Pt*(*x*) and each *<sup>y</sup> <sup>f</sup>* where *Pt*(*x*) <sup>∩</sup> *Pt*(*y*) <sup>=</sup> <sup>∅</sup>. Reading from *x.f* when it has not been previously assigned to is just a normal read, after first strongly updating *x f* to be the join of the summary read of *o f* for each *o* ∈ *Pt*(*x*).

### **3.3 Abstract Object Representation (OR)**

Another key precision/performance tradeoff is the *abstract object representation* (OR) used by the points-to analysis. In particular, when *P t*(*x*) = {*o*1*, ..., on*}, where do the names *o*1*, ..., o<sup>n</sup>* come from? The answer impacts the naming of summary objects, the granularity of alias checks for assignments to access paths, and the precision of the call-graph, which requires aliasing information to determine which methods are targeted by a dynamic dispatch *x.m*(*...*).

As shown in the third row of Table 1, we explore three representations for abstract objects. The first choice names abstract objects according to their *allocation site* (ALLO)—all objects allocated at the same program point have the same name. This is precise but potentially expensive, since there are many possible allocation sites, and each path *x.f* could be mapped to many abstract objects. We also consider representing abstract objects using *class names* (CLAS), where all objects of the same class share the same abstract name, and a hybrid *smushed string* (SMUS) approach, where every String object has the same abstract name but objects of other types have allocation-site names [9]. The class name approach is the least precise but potentially more efficient since there are fewer names to consider. The smushed string analysis is somewhere in between. The question is whether the reduction in names helps performance enough, without overly compromising precision.

### **4 Method Calls**

So far we have considered the first three options of Table 1, which handle integer variables and the heap. This section considers the last two options interprocedural analysis order (AO) and context sensitivity (CS).

### **4.1 Interprocedural Analysis Order (AO)**

We implement three styles of interprocedural analysis: top-down (TD), bottomup (BU), and their combination (TD+BU). The TD analysis starts at the program entry point and, as it encounters method calls, analyzes the body of the callee (memoizing duplicate calls). The BU analysis starts at the leaves of the call graph and analyzes each method in isolation, producing a summary of its behavior [29,53]. (We discuss call graph construction in the next subsection.) This summary is then instantiated at each method call. The hybrid analysis works top-down for application code but bottom-up for any code from the Java standard library.

**Top-Down (TD).** Assuming the analyzer knows the method being called, a simple approach to top-down analysis would be to transfer the caller's state to the beginning of callee, analyze the callee in that state, and then transfer the state at the end of the callee back to the caller. Unfortunately, this approach is prohibitively expensive because the abstract state would accumulate all local variables and access paths across all methods along the call-chain.

We avoid this blowup by analyzing a call to method *m* while considering only relevant local variables and heap abstractions. Ignoring the heap for the moment, the basic approach is as follows. First, we make a copy *C<sup>m</sup>* of the caller's abstract state *C*. In *Cm*, we set variables for *m*'s formal numeric arguments to the actual arguments and then forget (as defined in Sect. 3.1) the caller's local variables. Thus *C<sup>m</sup>* will only contain the portion of *C* relevant to *m*. We analyze *m*'s body, starting in *Cm*, to yield the final state *C <sup>m</sup>*. Lastly, we merge *C* and *C <sup>m</sup>*, strongly update the variable that receives the returned result, and forget the callee's local variables—thus avoiding adding the callee's locals to the caller's state.

Now consider the heap. If we are using summary objects, when we copy *C* to *C<sup>m</sup>* we do not forget those objects that might be used by *m* (according to the points-to analysis). As *m* is analyzed, the summary objects will be weakly updated, ultimately yielding state *C <sup>m</sup>* at *m*'s return. To merge *C <sup>m</sup>* with *C*, we first forget the summary objects in *C* not forgotten in *C<sup>m</sup>* and then concatenate *C <sup>m</sup>* with *C*. The result is that updated summary objects from *C <sup>m</sup>* replace those that were in the original *C*.

If we are using access paths, then at the call we forget access paths in *C* because assignments in *m*'s code might invalidate them. But if we have an access path *x.f* in the caller and we pass *x* to *m*, then we retain *x.f* in the callee but rename it to use *m* s parameter's name. For example, *x.f* becomes *y.f* if *m*'s parameter is *y*. If *y* is never assigned to in *m*, we can map *y.f* back to *x.f* (in the caller) once *m* returns.<sup>4</sup> All other access paths in *C<sup>m</sup>* are forgotten prior to concatenating with the caller's state.

Note that the above reasoning is only for numeric values. We take no particular steps for pointer values as the points-to analysis already tracks those across all methods.

<sup>4</sup> Assignments to *y.f* in the callee are fine; only assignments to *y* are problematic.

**Bottom Up (BU).** In the BU analysis, we analyze a method *m*'s body to produce a *method summary* and then instantiate the summary at calls to *m*. Ignoring the heap, producing a method summary for *m* is straightforward: start analyzing *m* in a state *C<sup>m</sup>* in which its (numeric) parameters are unconstrained variables. When *m* returns, forget all variables in the final state except the parameters and return value, yielding a state *C <sup>m</sup>* that is the method summary. Then, when *m* is called, we concatenate *C <sup>m</sup>* with the current abstract state; add constraints between the parameters and their actual arguments; strongly update the variable receiving the result with the summary's returned value; and then forget those variables.

When using the polyhedral numeric domain, *C <sup>m</sup>* can express relationships between input and output parameters, e.g., ret z or ret = x+y. For the interval domain, which is non-relational, summaries are more limited, e.g., they can express ret - 100 but not ret x. As such, we expect bottom-up analysis to be far more useful with the polyhedral domain than the interval domain.

*Summary Objects.* Now consider the heap. Recall that when using summary objects in the TD analysis, reading a path *x.f* into *z* "expands" each summary object *o f* when *o* ∈ *P t*(*x*) and strongly updates *z* with the join of these expanded objects, before forgetting them. This expansion makes a copy of each summary object's constraints so that later use of *z* does not incorrectly impact the summary. However, when analyzing a method bottom-up, we may not yet know all of a summary object's constraints. For example, if *x* is passed into the current method, we will not (yet) know if *o f* is assigned to a particular numeric range in the caller.

We solve this problem by allocating a fresh, unconstrained *placeholder object* at each read of *x.f* and include it in the initialization of the assigned-to variable *z*. The placeholder is also retained in *m*'s method summary. Then at a call to *m*, we instantiate each placeholder with the constraints in the caller involving the placeholder's summary location. We also create a fresh placeholder in the caller and weakly update it to the placeholder in the callee; doing so allows for further constraints to be added from calls further up the call chain.

*Access Paths.* If we are using access paths, we treat them just as in TD—each *x.f* is allocated a special variable that is strongly updated when possible, according to the points-to analysis. These are not kept in method summaries. When also using summary objects, at the first read to *x.f* we initialize it from the summary objects derived from *x*'s points-to set, following the above expansion procedure. Otherwise *x.f* will be unconstrained.

**Hybrid (TD+BU).** In addition to TD or BU analysis (only), we implemented a hybrid strategy that performs TD analysis for the application, but BU analysis for code from the Java standard library. Library methods are analyzed first, bottom-up. Application method calls are analyzed top-down. When an application method calls a library method, it applies the BU method call approach. TD+BU could potentially be better than TD because library methods, which are likely called many times, only need to be analyzed once. TD+BU could similarly be better than BU because application methods, which are likely not called as many times as library methods, can use the lower-overhead TD analysis.

Now, consider the interaction between the heap abstraction and the analysis order. The use of access paths (only) does not greatly affect the normal TD/BU tradeoff: TD may yield greater precision by adding constraints from the caller when analyzing the callee, while BU's lower precision comes with the benefit of analyzing method bodies less often. Use of summary objects complicates this tradeoff. In the TD analysis, the use of summary objects adds a relatively stable overhead to all methods, since they are included in every method's abstract state. For the BU analysis, methods further down in the call chain will see fewer summary objects used, and method bodies may end up being analyzed less often than in the TD case. On the other hand, placeholder objects add more dimensions overall (one per read) and more work at call sites (to instantiate them). But, instantiating a summary may be cheaper than reanalyzing the method.

### **4.2 Context Sensitivity (CS)**

The last design choice we considered was context-sensitivity. A *contextinsensitive* (CI) analysis conflates information from different call sites of the same method. For example, two calls to method *m* in which the first passes *x*1*, y*<sup>1</sup> and the second passes *x*2*, y*<sup>2</sup> will be conflated such that within *m* we will only know that either *x*<sup>1</sup> or *x*<sup>2</sup> is the first parameter, and either *y*<sup>1</sup> or *y*<sup>2</sup> is the second; we will miss the correlation between parameters. A context sensitive analysis provides some distinction among different call sites. A *1-CFA analysis* [46] (1CFA) distinguishes based on one level of calling context, i.e., two calls originating from different program points will be distinguished, but two calls from the same point, but in a method called from two different points will not. A *type-sensitive analysis* [49] (1TYP) uses the type of the receiver as the context.

Context sensitivity in the points-to analysis affects alias checks, e.g., when determining whether an assignment to *x.f* might affect *y.f*. It also affects the abstract object representation and call graph construction. Due to the latter, context sensitivity also affects our interprocedural numeric analysis. In a contextsensitive analysis, a single method is essentially treated as a family of methods indexed by a calling context. In particular, our analysis keeps track of the current context as a *frame*, and when considering a call to method x.m(), the target methods to which m may refer differ depending on the frame. This provides more precision than a context-insensitive (i.e., frame-less) approach, but the analysis may consider the same method code many times, which adds greater precision but also greater expense. This is true both for TD and BU, but is perhaps more detrimental to the latter since it reduces potential method summary reuse. On the other hand, more precise analysis may reduce unnecessary work by pruning infeasible call graph edges. For example, when a call might dynamically dispatch to several different methods, the analysis must consider them all, joining their abstract states. A more precise analysis may consider fewer target methods.

### **5 Implementation**

We have implemented an analysis for Java with all of the options described in the previous two sections. Our implementation is based on the intermediate representation in the T. J. Watson Libraries for Analysis (WALA) version 1.3.10 [2], which converts a Java bytecode program into static single assignment (SSA) form [20], which is then analyzed. We use APRON [33,41] trunk revision 1096 (published on 2016/05/31) implementation of intervals, and ELINA [47,48], snapshot as of October 4, 2017, for convex polyhedra. Our current implementation supports all non-floating point numeric Java values and comprises 14 K lines of Scala code.

Next we discuss a few additional implementation details.

*Preallocating Dimensions.* In both APRON and ELINA, it is very expensive to perform join operations that combine abstract states with different variables. Thus, rather than add dimensions as they arise during abstract interpretation, we instead *preallocate* all necessary dimensions—including for local variables, access paths, and summary objects, when enabled—at the start of a method body. This ensures the abstract states have the same dimensions at each join point. We found that, even though this approach makes some states larger than they need to be, the overall performance savings is still substantial.

*Arrays.* Our analysis encodes an array as an object with two fields, contents, which represents the contents of the array, and len, representing the array's length. Each read/write from a[i] is modeled as a weak read/write of contents (because all array elements are represented with the same field), with an added check that i is between 0 and len. We treat Strings as a special kind of array.

*Widening.* As is standard in abstract interpretation, our implementation performs widening to ensure termination when analyzing loops. In a pilot study, we compared widening after between one and ten iterations. We found that there was little added precision when applying widening after more than three iterations when trying to prove array indexes in bounds (our target application, discussed next). Thus we widen at that point in our implementation.

*Limitations.* Our implementation is sound with a few exceptions. In particular, it ignores calls to native methods and uses of reflection. It is also unsound in its handling of recursive method calls. If the return value of a recursive method is numeric, it is regarded as unconstrained. Potential side effects of the recursive calls are not modeled.

### **6 Evaluation**

In this section, we present an empirical study of our family of analyses, focusing on the following research questions:

**RQ1: Performance.** How does the configuration affect analysis running time? **RQ2: Precision.** How does the configuration affect analysis precision?

**RQ3: Tradeoffs.** How does the configuration affect precision and performance?

To answer these questions, we chose an important analysis client, array index out-of-bound analysis, and ran it on the DaCapo benchmark suite [6]. We vary each of the analysis features listed in Table 1, yielding 162 total configurations. To understand the impact of analysis features, we used multiple linear regression and logistic regression to model precision and performance (the dependent variables) in terms of analysis features and across programs (the independent variables). We also studied per-program data directly.

Overall, we found that using access paths is a significant boon to precision but costs little in performance, while using summary objects is the reverse, to the point that use of summary objects is a significant source of timeouts. Polyhedra add precision compared to intervals, and impose some performance cost, though only half as much as summary objects. Interestingly, when both summary objects and polyhedra together would result in a timeout, choosing the first tends to provide better precision over the second. Finally, bottom-up analysis harms precision compared to top-down analysis, especially when only summary objects are enabled, but yields little gain in performance.

### **6.1 Experimental Setup**

We evaluated our analyses by using them to perform array index out of bounds analysis. More specifically, for each benchmark program, we counted how many array access instructions (x[i]=y, y=x[i], etc.) an analysis configuration could verify were in bounds (i.e., i<x.length), and measured the time taken to perform the analysis.

*Benchmarks.* We analyzed all eleven programs from the DaCapo benchmark suite [6] version 2006-10-MR2. The first three columns of Table 2 list the programs' names, their size (number of IR instructions), and the number of array bounds checks they contain. The rest of the table indicates the fastest and most precise analysis configuration for each program; we discuss these results in Sect. 6.4. We ran each benchmark three times under each of the 162 analysis configurations. The experiments were performed on two 2.4 GHz single processor (with four logical cores) Intel Xeon E5-2609 servers, each with 128GB memory running Ubuntu 16.04 (LTS). On each server, we ran three analysis configurations in parallel, binding each process to a designated core.

Since many analysis configurations are time-intensive, we set a limit of 1 hour for running a benchmark under a particular configuration. All performance results reported are the median of the three runs. We also use the median precision result, though note the analyses are deterministic, so the precision does not vary except in the case of timeouts. Thus, we treat an analysis as not timing out as long as either two or three of the three runs completed, and otherwise it is a timeout. Among the 1782 median results (11 benchmarks, 162 configurations),


**Table 2.** Benchmarks and overall results.

667 of them (37%) timed out. The percentage of the configurations that timed out analyzing a program ranged from 0% (xalan) to 90% (chart).

*Statistical Analysis.* To answer RQ1 and RQ2, we constructed a model for each question using multiple linear regression. Roughly put, we attempt to produce a model of performance (RQ1) and precision (RQ2)—the *dependent variables*—in terms of a linear combination of analysis configuration options (i.e., one choice from each of the five categories given in Table 1) and the benchmark program (i.e., one of the eleven subjects from DaCapo)—the *independent variables*. We include the programs themselves as independent variables, which allows us to roughly factor out program-specific sources of performance or precision gain/loss (which might include size, complexity, etc.); this is standard in this sort of regression [45]. Our models also consider all two-way interactions among analysis options. In our scenario, a significant interaction between two option settings suggests that the combination of them has a different impact on the analysis precision and/or performance compared to their independent impact.

To obtain a model that best fits the data, we performed variable selection via the Akaike Information Criterion (AIC) [12], a standard measure of model quality. AIC drops insignificant independent variables to better estimate the impact of analysis options. The R<sup>2</sup> values for the models are good, with the lowest of any model being 0.71.

After performing the regression, we examine the results to discover potential trends. Then we draw plots to examine how those trends manifest in the different programs. This lets us study the whole distribution, including outliers and any non-linear behavior, in a way that would be difficult if we just looked at the regression model. At the same time, if we only looked at plots it would be hard to see general trends because there is so much data.

*Threats to Validity.* There are several potential threats to the validity of our study. First, the benchmark programs may not be representative of programs that analysis users are interested in. That said, the programs were drawn from a well-studied benchmark suite, so they should provide useful insights.

Second, the insights drawn from the results of the array index out-of-bound analysis may not reflect the trends of other analysis clients. We note that array bounds checking is a standard, widely used analysis.

Third, we examined a design space of 162 analysis configurations, but there are other design choices we did not explore. Thus, there may be other independent variables that have important effects. In addition, there may be limitations specific to our implementation, e.g., due to precisely how WALA implements points-to analysis. Even so, we relied on time-tested implementations as much as possible, and arrived at our choices of analysis features by studying the literature and conversing with experts. Thus, we believe our study has value even if further variables are worth studying.

Fourth, for our experiments we ran each analysis configuration three times, and thus performance variation may not be fully accounted for. While more trials would add greater statistical assurance, each trial takes about a week to run on our benchmark machines, and we observed no variation in precision across the trials. We did observe variations in performance, but they were small and did not affect the broader trends. In more detail, we computed the variance of the running time among a set of three runs of a configuration as (max-min)/median to calculate the variance. The average variance across all configurations is only 4.2%. The maximum total time difference (max-min) is 32 min, an outlier from eclipse. All the other time differences are within 4 min.

### **6.2 RQ1: Performance**

Table 3 summarizes our regression model for performance. We measure performance as the time to run both the core analysis and perform array index outof-bounds checking. If a configuration timed out while analyzing a program, we set its running time as one hour, the time limit (characterizing a lower bound on the configuration's performance impact). Another option would have been to **Table 3. Model of run-time performance** in terms of analysis configuration options (Table 1), including two-way interactions. Independent variables for individual programs not shown. *R*<sup>2</sup> of 0.72.


leave the configuration out of the regression, but doing so would underrepresent the important negative contribution to performance.

In the top part of the table, the first column shows the independent variables and the second column shows a setting. One of the settings, identified by dashes in the remaining columns, is the baseline in the regression. We use the following settings as baselines: TD, AP+SO, 1TYP, ALLO, and POL. We chose the baseline according to what we expected to be the most precise settings. For


**Table 4. Model of timeout** in terms of analysis configuration options (Table 1). Independent variables for individual programs not shown. *R*<sup>2</sup> of 0.77.

the other settings, the third column shows the estimated effect of that setting with all other settings (including the choice of program, each an independent variable) held fixed. For example, the fifth row of the table shows that AP (only) decreases overall analysis time by 37.6 min compared to AP+SO (and the other baseline settings). The fourth column shows the 95% confidence interval around the estimate, and the last column shows the *p*-value. As is standard, we consider *p*-values less than 0.05 (5%) significant; such rows are highlighted green.

The bottom part of the table shows the additional effects of two-way combinations of options compared to the baseline effects of each option. For example, the BU:CLAS row shows a coefficient of –8.87. We add this to the individual effects of BU (–1.98) and CLAS (–11.0) to compute that BU:CLAS is 21.9 min faster (since the number is negative) than the baseline pair of TD:ALLO. Not all interactions are shown, e.g., AO:CS is not in the table. Any interactions not included were deemed not to have meaningful effect and thus were dropped by the model generation process [12].

Setting the running time of a timed-out configuration as one hour in Table 3 may under-report a configuration's (negative) performance impact. For a more complete view, we follow the suggestion of Arcuri and Briand [3], and construct a model of success/failure using logistic regression. We consider "if a configuration timed out" as the categorical dependent variable, and the analysis configuration options and the benchmark programs as independent variables.

Table 4 summarizes our logistic regression model for timeout. The coefficients in the third column represent the change in log likelihood associated with each configuration setting, compared to the baseline setting. Negative coefficients indicate lower likelihood of timeout. The exponential of the coefficient, Exp(coef) in the fifth column, indicates roughly how strongly that configuration setting being turned on affects the likelihood relative to the baseline setting. For example, the third row of the table shows that BU is roughly 5 times less likely to time out compared to TD, a significant factor to the model.

Tables 3 and 4 present several interesting performance trends.

*Summary Objects Incur a Significant Slowdown.* Use of summary objects results in a very large slowdown, with high significance. We can see this in the AP row in Table 3. It indicates that using *only* AP results in an average 37.6-min speedup compared to the baseline AP+SO (while SO only had no significant difference from the baseline). We observed a similar trend in Table 4; use of summary objects has the largest effect, with high significance, on the likelihood of timeout. Indeed, 624 out of the 667 analyses that timed out had summary objects enabled (i.e., SO or AP+SO). We investigated further and found the slowdown from summary objects is mostly due to significantly larger number of dimensions included in the abstract state. For example, analyzing jython with AP-TD-CI-ALLO-INT has, on average, 11 numeric variables when analyzing a method, and the whole analysis finished in 15 min. Switching AP to SO resulted in, on average, 1473 variables per analyzed method and the analysis ultimately timed out.

*The Polyhedral Domain is Slow, But Not as Slow as Summary Objects.* Choosing INT over baseline POL nets a speedup of 16.51 min. This is the second-largest performance effect with high significance, though it is half as large as the effect of SO. Moreover, per Table 4, turning on POL is more likely to result in timeout; 409 out of 667 analyses that timed out used POL.

*Heavyweight CS and OR Settings Hurt Performance, Particularly When Using Summary Objects.* For CS settings, CI is faster than baseline 1TYP by 7.1 min, while there is not a statistically significant difference with 1CFA. For the OR settings, we see that the more lightweight representations CLAS and SMUS are faster than baseline ALLO by 11.00 and 7.15 min, respectively, when using baseline AP+SO. This makes sense because these representations have a direct effect on reducing the number of summary objects. Indeed, when summary objects are disabled, the performance benefit disappears: AP:CLAS and AP:SMUS add back 9.55 and 6.25 min, respectively.

*Bottom-up Analysis Provides No Substantial Performance Advantage.* Table 4 indicates that a BU analysis is less likely to time out than a TD analysis. However, the performance model in Table 3 does not show a performance advantage of bottom-up analysis: neither BU nor TD+BU provide a statistically significant impact on running time over baseline TD. Setting one hour for the configurations that timed out in the performance model might fail to capture the negative performance of top-down analysis. This observation underpins the utility of constructing a success/failure analysis to complement the performance model. In any case, we might have expected bottom-up analysis to provide a real performance advantage (Sect. 4.1), but that is not what we have observed.

### **6.3 RQ2: Precision**

Table 5 summarizes our regression model for precision, using the same format as Table 3. We measure precision as the number of array indexes proven to be in



bounds. As recommended by Arcuri and Briand [3], we omit from the regression those configurations that timed out.<sup>5</sup> We see several interesting trends.

*Access Paths are Critical to Precision.* Removing access paths from the configuration, by switching from AP+SO to SO, yields significantly lower precision. We see this in the SO (only) row in the table, and in all of its interactions (i.e., SO:*opt* and *opt*:SO rows). In contrast, AP on its own is not statistically worse than AP+SO, indicating that summary objects often add little precision. This is unfortunate, given their high performance cost.

*Bottom-up Analysis Harms Precision Overall, Especially for SO (Only).* BU has a strongly negative effect on precision: 129.98 fewer checks compared to TD. Coupled with SO it fares even worse: BU:SO nets 686.79 fewer checks, and TD+BU:SO nets 630.99 fewer. For example, for xalan the most precise configuration, which uses TD and AP+SO, discharges 981 checks, while all configurations

<sup>5</sup> The alternative of setting precision to be 0 would misrepresent the general power of a configuration, particularly when combined with runs that did not time out. Fewer runs might reduce statistical power, however, which is captured in the model.

that instead use BU and SO on xalan discharge close to zero checks. The same basic trend holds for just about every program.

*The Relational Domain Only Slightly Improves Precision.* The row for INT is not statistically different from the baseline POL. This is a bit of a surprise, since by itself POL is strictly more precise than INT. In fact, it does improve precision empirically when coupled with either AP or SO—the interaction AP:INT and SO:INT reduces the number of checks. This sets up an interesting performance tradeoff that we explore in Sect. 6.4: using AP+SO with INT vs. using AP with POL.

*More Precise Abstract Object Representation Improves Precision, But Context Sensitivity Does Not.* The table shows CLAS discharges 90.15 fewer checks compared to ALLO. Examining the data in detail, we found this occurred because CLAS conflates all arrays of the same type as one abstract object, thus imprecisely approximating those arrays' lengths, in turn causing some checks to fail.

Also notice that context sensitivity (CS) does not appear in the model, meaning it does not significantly increase or decrease the precision of array bounds checking. This is interesting, because context-sensitivity is known to reduce points-to set size [35,49] (thus yielding more precise alias checks and dispatch targets). However, for our application this improvement has minimal impact.

### **6.4 RQ3: Tradeoffs**

Finally, we examine how analysis settings affect the tradeoff between precision and performance. To begin out discussion, recall Table 2 (page 12), which shows the fastest configuration and the most precise configuration for each benchmark. Further, the table shows the configurations' running time, number of checks discharged, and percentage of checks discharged.

We see several interesting patterns in this table, though note the table shows just two data points and not the full distribution. First, the configurations in each column are remarkably consistent. The fastest configurations are all of the form BU-AP-CI-\*-INT, only varying in the abstract object representation. The most precise configurations are more variable, but all include TD and some form of AP. The rest of the options differ somewhat, with different forms of precision benefiting different benchmarks. Finally, notice that, overall, the fastest configurations are much faster than the most precise configurations—often by an order of magnitude—but they are not that much less precise—typically by 5–10% points.

To delve further into the tradeoff, we examine, for each program, the overall performance and precision distribution for the analysis configurations, focusing on particular options (HA, AO, etc.). As settings of option HA have come up prominently in our discussion so far, we start with it and then move through the other options. Figure 1 gives per-benchmark scatter plots of this data. Each plotted point corresponds to one configuration, with its performance on the *x*axis and number of discharged array bounds checks on the *y*-axis. We regard a configuration that times out as discharging no checks, so it is plotted at (60, 0).

**Fig. 1.** Tradeoffs: AP vs. SO vs. AP+SO.

**Fig. 2.** Tradeoffs: TD vs. BU vs. TD+BU.

**Fig. 3.** Tradeoffs: ALLO vs. SMUS vs. CLAS.

The shape of a point indicates the HA setting of the corresponding configuration: black circle for AP, red triangle for AP+SO, and blue cross for SO.

As a general trend, we see that *access paths improve precision and do little to harm performance; they should always be enabled.* More specifically, configurations using AP and AP+SO (when they do not time out) are always toward the top of the graph, meaning good precision. Moreover, the performance profile of SO and AP+SO is quite similar, as evidenced by related clusters in the graphs differing in the y-axis, but not the x-axis. In only one case did AP+SO time out when SO alone did not.<sup>6</sup>

On the flip side, *summary objects are a significant performance bottleneck for a small boost in precision.* On the graphs, we can see that the black AP circles are often among the most precise, while AP+SO tend to be the best (8*/*11 cases in Table 2). But AP are much faster. For example, for bloat, chart, and jython, only AP configurations complete before the timeout, and for pmd, all but four of the configurations that completed use AP.

*Top-Down Analysis is Preferred: Bottom-up is less precise and does little to improve performance.* Figure 2 shows a scatter plot of the precision/performance behavior of all configurations, distinguishing those with BU (black circles), TD (red triangles), and TD+BU (blue crosses). Here the trend is not as stark as with HA, but we can see that the mass of TD points is towards the upperleft of the plots, except for some timeouts, while BU and TD+BU have more configurations at the bottom, with low precision. By comparing the same (x,y) coordinate on a graph in this figure with the corresponding graph in the previous one, we can see options interacting. Observe that the cluster of black circles at the lower left for antlr in Fig. 2(a) correspond to SO-only configurations in Fig. 1(a), thus illustrating the strong negative interaction on precision of BU:SO we discussed in the previous subsection. The figures (and Table 2) also show that the best-performing configurations involve bottom-up analysis, but usually the

<sup>6</sup> In particular, for eclipse, configuration TD+BU-SO-1CFA-ALLO-POL finished at 59 min, while TD+BU-AP+SO-1CFA-ALLO-POL timed out.

**Fig. 4.** Tradeoffs: INT vs. POL.

benefit is inconsistent and very small. And TD+BU does not seem to balance the precision/performance tradeoff particularly well.

*Precise Object Representation Often Helps with Precision at a Modest Cost to Performance.* Figure 3 shows a representative sample of scatter plots illustrating the tradeoff between ALLO, CLAS, and SMUS. In general, we see that the highest points tend to be ALLO, and these are more to the right of CLAS and SMUS. On the other hand, the precision gain of ALLO tends to be modest, and these usually occur (examining individual runs) when combining with AP+SO. However, summary objects and ALLO together greatly increase the risk of timeouts and low performance. For example, for eclipse the row of circles across the bottom are all SO-only.

*The Precision Gains of POLY are More Modest than Gains Due to Using AP+SO (over AP).* Figure 4 shows scatter plots comparing INT and POLY. We investigated several groupings in more detail and found an interesting interaction between the numeric domain and the heap abstraction: POLY is often better than INT for AP (only). For example, the points in the upper left of bloat use AP, and POLY is slightly better than INT. The same phenomenon occurs in luindex in the cluster of triangles and circles to the upper left. But INT does better further up and to the right in luindex. This is because these configurations use AP+SO, which times out when POLY is enabled. A similar phenomenon occurs for the two points in the upper right of pmd, and the most precise points for hsqldb. Indeed, when a configuration with AP+SO-INT terminates, it will be more precise than those with AP-POLY, but is likely slower. We manually inspected the cases where AP+SO-INT is more precise than AP-POLY, and found that it mostly is because of the limitation that access paths are dropped through method calls. AP+SO rarely terminates when coupled with POLY because of the very large number of dimensions added by summary objects.

### **7 Related Work**

Our numeric analysis is novel in its focus on fully automatically identifying numeric invariants in real (heap-manipulating, method-calling) Java programs, while aiming to be sound. We know of no prior work that carefully studies precision and performance tradeoffs in this setting. Prior work tends to be much more imprecise and/or intentionally unsound, but scale better, or more precise, but not scale to programs as large as those in the DaCapo benchmark suite.

*Numeric vs. Heap Analysis.* Many abstract interpretation-based analyses focus on numeric properties or heap properties, but not both. For example, Calcagno et al. [13] uses separation logic to create a compositional, bottom-up heap analysis. Their client analysis for Java checks for NULL pointers [1], but not out-ofbounds array indexes. Conversely, the PAGAI analyzer [31] for LLVM explores abstract interpretation algorithms for precise invariants of numeric variables, but ignores the heap (soundly treating heap locations as ).

*Numeric Analysis in Heap-Manipulating Programs.* Fu [25] first proposed the basic summary object heap abstraction we explore in this paper. The approach uses a points-to analysis [44] as the basis of generating abstract names for summary objects that are weakly updated [27]. The approach does not support strong updates to heap objects and ignores procedure calls, making unsound assumptions about effects of calls to or from the procedure being analyzed. Fu's evaluation on DaCapo only considered how often the analysis yields a non field, while ours considers how often the analysis can prove that an array index is in bounds, which is a more direct measure of utility. Our experiments strongly suggest that when modeled soundly and at scale, summary objects add enormous performance overhead while doing much less to assist precision when compared to strongly updatable access paths alone [21,52].

Some prior work focuses on inferring precise invariants about heap-allocated objects, e.g., relating the presence of an object in a collection to the value of one of the object's fields. Ferrera et al. [23,24] also propose a composed analysis for numeric properties of heap manipulating programs. Their approach is amenable to both points-to and shape analyses (e.g., TVLA [34]), supporting strong updates for the latter. Deskcheck [39] and Chang and Rival [14,15] also aim to combine shape analysis and numeric analysis, in both cases requiring the analyst to specify predicates about the data structures of interest. Magill [37] automatically converts heap-manipulating programs into integer programs such that proving a numeric property of the latter implies a numeric shape property (e.g., a list's length) of the former. The systems just described support more precise invariants than our approach, but are less general or scalable: they tend to focus on much smaller programs, they do not support important language features (e.g., Ferrara's approach lacks procedures, Deskcheck lacks loops), and may require manual annotation.

Clousot [22] also aims to check numeric invariants on real programs that use the heap. Methods are analyzed in isolation but require programmer-specified pre/post conditions and object invariants. In contrast, our interprocedural analysis is fully automated, requiring no annotations. Clousot's heap analysis makes local, optimistic (and unsound) assumptions about aliasing,<sup>7</sup> while our approach aims to be sound by using a global points-to analysis.

*Measuring Analysis Parameter Tradeoffs.* We are not aware of work exploring performance/precision tradeoffs of features in realistic abstract interpreters. Oftentimes, papers leave out important algorithmic details. The initial Astree´ paper [7] contains a wealth of ideas, but does not evaluate them systematically, instead reporting anecdotal observations about their particular analysis targets. More often, papers focus on one element of an analysis to evaluate, e.g., Logozzo [36] examines precision and performance tradeoffs useful for certain kinds of numeric analyses, and Ferrara [24] evaluates his technique using both intervals and octagons as the numeric domain. Regarding the latter, our paper shows that interactions with the heap abstraction can have a strong impact on

<sup>7</sup> Interestingly, Clousot's assumptions often, but not always, lead to sound results [16].

the numeric domain precision/performance tradeoff. Prior work by Smaragdakis et al. [49] investigates the performance/precision tradeoffs of various implementation decisions in points-to analysis. Paddle [35] evaluates tradeoffs among different abstractions of heap allocation sites in a points-to analysis, but specifically only evaluates the heap analysis and not other analyses that use it.

### **8 Conclusion and Future Work**

We presented a family of static numeric analyses for Java. These analyses implement a novel combination of techniques to handle method calls, heap-allocated objects, and numeric analysis. We ran the 162 resulting analysis configurations on the DaCapo benchmark suite, and measured performance and precision in proving array indexes in bounds. Using a combination of multiple linear regression and data visualization, we found several trends. Among others, we discovered that strongly updatable access paths are always a good idea, adding significant precision at very little performance cost. We also found that top-down analysis also tended to improve precision at little cost, compared to bottom-up analysis. On the other hand, while summary objects did add precision when combined with access paths, they also added significant performance overhead, often resulting in timeouts. The polyhedral numeric domain improved precision, but would time out when using a richer heap abstraction; intervals and a richer heap would work better.

The results of our study suggest several directions for future work. For example, for many programs, a much more expensive analysis often did not add much more in terms of precision; a pre-analysis that identifies the tradeoff would be worthwhile. Another direction is to investigate a more sparse representation of summary objects that retains their modest precision benefits, but avoids the overall blowup. We also plan to consider other analysis configuration options. Our current implementation uses an ahead-of-time points-to analysis to model the heap; an alternative solution is to analyze the heap along with the numeric analysis [43]. Concerning abstract object representation and context sensitivity, there are other potentially interesting choices, e.g., recency abstraction [5] and object sensitivity [40]. Other interesting dimensions to consider are field sensitivity [32] and widening, notably *widening with thresholds*. Finally, we plan to explore other effective ways to design hybrid top-down and bottom-up analysis [54], and investigate sparse inter-procedural analysis for better performance [42].

**Acknowledgments.** We thank Gagandeep Singh for his help in debugging ELINA. We thank Arlen Cox, Xavier Rival, and the anonymous reviewers for their detailed feedback and comments. This research was supported in part by DARPA under contracts FA8750-15-2-0104 and FA8750-16-C-0022.

### **References**


682 S. Wei et al.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **An Abstract Interpretation Framework for Input Data Usage**

Caterina Urban(B) and Peter M¨uller

Department of Computer Science, ETH Zurich, Zurich, Switzerland {caterina.urban,peter.mueller}@inf.ethz.ch

**Abstract.** Data science software plays an increasingly important role in critical decision making in fields ranging from economy and finance to biology and medicine. As a result, errors in data science applications can have severe consequences, especially when they lead to results that look plausible, but are incorrect. A common cause of such errors is when applications erroneously ignore some of their input data, for instance due to bugs in the code that reads, filters, or clusters it.

In this paper, we propose an abstract interpretation framework to automatically detect unused input data. We derive a program semantics that precisely captures data usage by abstraction of the program's operational trace semantics and express it in a constructive fixpoint form. Based on this semantics, we systematically derive static analyses that automatically detect unused input data by fixpoint approximation.

This clear design principle provides a framework that subsumes existing analyses; we show that secure information flow analyses and a form of live variables analysis can be used for data usage, with varying degrees of precision. Additionally, we derive a static analysis to detect single unused data inputs, which is similar to dependency analyses used in the context of backward program slicing. Finally, we demonstrate the value of expressing such analyses as abstract interpretation by combining them with an existing abstraction of compound data structures such as arrays and lists to detect unused chunks of the data.

### **1 Introduction**

In the past few years, data science has grown considerably in importance and now heavily influences many domains, ranging from economy and finance to biology and medicine. As we rely more and more on data science for making decisions, we become increasingly vulnerable to programming errors.

Programming errors can cause frustration, especially when they lead to a program failure after hours of computation. However, programming errors that do not cause failures can have more serious consequences as code that produces an erroneous but plausible result gives no indication that something went wrong. A notable example is the paper "Growth in a Time of Debt" published in 2010 by economists Reinhart and Rogoff, which was widely cited in political debates and

**Fig. 1.** Overview of the program semantics presented in the paper. The *dependency semantics*, derived by abstraction of the *trace semantics*, is sound and complete for data usage. Further sound but not complete abstractions are shown on the right.

was later demonstrated to be flawed. Notably, one of the flaws was a programming error, which *entirely excluded some data* from the analysis [23]. Its critics hold that this paper led to unjustified adoption of austerity policies for countries with various levels of public debt [30]. Programming errors in data analysis code for medical applications are even more critical [27]. It is thus paramount to achieve a high level of confidence in the correctness of data science code.

The likelihood that a programming error causes some input data to remain unused is particularly high for data science applications, where data goes through long pipelines of modules that acquire, filter, merge, and manipulate it. In this paper, we propose an abstract interpretation [14] framework to automatically detect *unused input data*. We characterize when a program uses (some of) its input data using the notion of *dependency* between the input data and the *outcome* of the program. Our notion of dependency accounts for non-determinism and non-termination. Thus, it encompasses notions of dependency that arise in many different contexts, such as secure information flow and program slicing [1], as well as provenance or lineage analysis [9], to name a few.

Following the theory of abstract interpretation [12], we systematically derive a new program semantics that precisely captures exactly the information needed to reason about input data usage, abstracting away from irrelevant details about the program behavior. Figure 1 gives an overview of our approach. The semantics is first expressed in a constructive fixpoint form over *sets of sets of traces*, by partitioning the operational trace semantics of a program based on its outcome (cf. *outcome semantics* in Fig. 1), and a further abstraction ignores intermediate state computations (cf. *dependency semantics* in Fig. 1). Starting the development of the semantics from the operational trace semantics enables a uniform mathematical reasoning about programs semantics and program properties (Sect. 3). In particular, since input data usage is not a trace property or a subset-closed property [11] (Sect. 4), we show that a formulation of the semantics using sets of sets of traces is necessary for a sound validation of input data usage via fixpoint approximation [28].

This clear design principle provides a unifying framework for reasoning about existing analyses based on dependencies. We survey existing analyses and identify key design decisions that limit or facilitate their applicability to input data usage, and we assess their precision. We show that non-interference analyses [6] are sound for proving that a *terminating* program does not use *any* of its input data; although this is too strong a property in general. We prove that strongly live variable analysis [20] is sound for data usage even for non-terminating programs, albeit it is imprecise with respect to implicit dependencies between program variables. We then derive a more precise static analysis similar to dependency analyses used in the context of backward program slicing [37]. Finally, we demonstrate the value of expressing these analyses as abstract interpretations by combining them with an existing abstraction of compound data structures such as arrays and lists [16]. This allows us to detect unused chunks of the input data, and thus apply our work to realistic data science applications.

### **2 Trace Semantics**

The *semantics* of a program is a mathematical characterization of its behavior when executed for all possible input data. We model the operational semantics of a program as a *transition system* -Σ,τ where Σ is a (potentially infinite) set of program states and the transition relation τ ⊆ Σ × Σ describes the possible transitions between states [12,14]. Note that this model allows representing programs with (possibly unbounded) non-determinism. The set Ω def = {s ∈ Σ | ∀s ∈ Σ : s, s ∈ τ} is the set of *final states* of the program.

In the following, let Σ<sup>n</sup> def = {s<sup>0</sup> ··· s<sup>n</sup>−<sup>1</sup> | ∀i<n : s<sup>i</sup> ∈ Σ} be the set of all sequences of exactly n program states. We write ε to denote the empty sequence, i.e., Σ<sup>0</sup> def <sup>=</sup> {ε}. Let <sup>Σ</sup> def = - <sup>n</sup>∈<sup>N</sup> <sup>Σ</sup><sup>n</sup> be the set of all finite sequences, Σ<sup>+</sup> def = Σ- \ <sup>Σ</sup><sup>0</sup> be the set of all non-empty finite sequences, <sup>Σ</sup><sup>ω</sup> be the set of all infinite sequences, Σ<sup>+</sup><sup>∞</sup> def <sup>=</sup> <sup>Σ</sup><sup>+</sup> <sup>∪</sup> <sup>Σ</sup><sup>ω</sup> be the set of all non-empty finite or infinite sequences and Σ-<sup>∞</sup> def = Σ- <sup>∪</sup> <sup>Σ</sup><sup>ω</sup> be the set of all finite or infinite sequences of program states. In the following, we write σσ for the concatenation of two sequences σ, σ <sup>∈</sup> <sup>Σ</sup>-<sup>∞</sup> (with σε = εσ = σ, and σσ = σ when <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>), <sup>T</sup> <sup>+</sup> def <sup>=</sup> <sup>T</sup> <sup>∩</sup> <sup>Σ</sup><sup>+</sup> and <sup>T</sup> <sup>ω</sup> def <sup>=</sup> <sup>T</sup> <sup>∩</sup> <sup>Σ</sup><sup>ω</sup> for the selection of the non-empty finite sequences and the infinite sequences of <sup>T</sup> ∈ P (Σ-<sup>∞</sup>), and T ; T def = {σsσ | s ∈ Σ ∧ σs ∈ T ∧ sσ ∈ T } for the merging of two sets of sequences <sup>T</sup> ∈ P (Σ<sup>+</sup>) and <sup>T</sup> ∈ P (Σ<sup>+</sup>∞), when a finite sequence in <sup>T</sup> terminates with the initial state of a sequence in T .

Given a transition system -Σ,τ , a *trace* is a non-empty sequence of program states described by the transition relation τ , that is, s, s ∈ τ for each pair of

$$\begin{aligned} T\_0 &= \left\{ \stackrel{\scriptstyle \sum}{\sim} \stackrel{\scriptstyle \omega}{\sim} \right\} \\\\ T\_1 &= \left\{ \stackrel{\scriptstyle \Omega}{\bullet} \right\} \cup \left\{ \stackrel{\scriptstyle \tau}{\longrightarrow} \stackrel{\scriptstyle \sum}{\sim} \stackrel{\scriptstyle \Sigma^{\omega}}{\sim} \right\} \\\\ T\_2 &= \left\{ \stackrel{\scriptstyle \Omega}{\bullet} \right\} \cup \left\{ \stackrel{\scriptstyle \tau}{\longrightarrow} \stackrel{\scriptstyle \Omega}{\sim} \right\} \cup \left\{ \stackrel{\scriptstyle \tau}{\longrightarrow} \stackrel{\scriptstyle \tau}{\longrightarrow} \stackrel{\scriptstyle \tau}{\sim} \stackrel{\scriptstyle \Sigma^{\omega}}{\sim} \right\} \end{aligned}$$

**Fig. 2.** First fixpoint iterates of the trace semantics <sup>Λ</sup>.

consecutive states s, s ∈ Σ in the sequence. The set of final states Ω and the transition relation τ can be understood as sets of traces of length one and length two, respectively. The *trace semantics* <sup>Λ</sup> ∈ P (Σ<sup>+</sup>∞) generated by a transition system -Σ,τ is the union of all finite traces that are terminating with a final state in Ω, and all infinite traces. It can be expressed as a least fixpoint in the complete lattice -<sup>P</sup> (Σ<sup>+</sup>∞), ,, , Σ<sup>ω</sup>, Σ<sup>+</sup> [12]:

$$\begin{aligned} A &= \text{lfp} \stackrel{\square}{=} \Theta \\ \Theta(T) &\stackrel{\text{def}}{=} \Omega \cup (\tau \; ; T) \end{aligned} \tag{1}$$

where the computational order is T<sup>1</sup> T<sup>2</sup> def = T <sup>+</sup> <sup>1</sup> <sup>⊆</sup> <sup>T</sup> <sup>+</sup> <sup>2</sup> <sup>∧</sup> <sup>T</sup> <sup>ω</sup> <sup>1</sup> <sup>⊇</sup> <sup>T</sup> <sup>ω</sup> <sup>2</sup> . Figure 2 illustrates the first fixpoint iterates. The fixpoint iteration starts from the set of all infinite *sequences* of program states. At each iteration, the final program states in Ω are added to the set, and sequences already in the set are extended by prepending transitions to them. In this way, we *add* increasingly longer finite traces, and we *remove* infinite sequences of states with increasingly longer prefixes not forming traces. In particular, the i-th iterate builds all finite traces of length less than or equal to i, and selects all infinite sequences whose prefixes of length i form traces. At the limit we obtain all infinite traces and all finite traces that terminate in a final state in Ω. Note that Λ is *suffix-closed*.

The trace semantics Λ fully describes the behavior of a program. However, to reason about a particular property of a program, it is not necessary to consider all aspects of its behavior. In fact, reasoning is facilitated by the design of a semantics that abstracts away from irrelevant details about program executions. In the next sections, we define our property of interest and use abstract interpretation [14] to systematically derive, by successive abstractions of the trace semantics, a semantics that precisely captures such a property.

### **3 Input Data Usage**

A *property* is specified by its extension, that is, the set of elements having such a property [14,15]. Thus, properties of program traces in Σ<sup>+</sup><sup>∞</sup> are sets of traces in <sup>P</sup> (Σ<sup>+</sup>∞), and properties of programs with trace semantics in <sup>P</sup> (Σ<sup>+</sup>∞) are *sets of sets of traces* in <sup>P</sup> (<sup>P</sup> (Σ<sup>+</sup>∞)). Accordingly, a program <sup>P</sup> satisfies a property H∈P (<sup>P</sup> (Σ<sup>+</sup>∞)) if and only if its semantics [[P]] ∈ P (Σ<sup>+</sup>∞) belongs to <sup>H</sup>:

$$P \vdash \mathcal{H} \Leftrightarrow \lbrack P \rbrack \in \mathcal{H} \tag{2}$$

Some program properties are defined in terms of individual program traces and can be equivalently expressed as trace properties. This is the case for the traditional safety [26] and liveness [4] properties of programs. In such a case, a program P satisfies a trace property T if and only if all traces in its semantics [[P]] belong to the property: P |= T ⇔ [[P]] ⊆ T .

Program properties that establish a relation between different program traces cannot be expressed as trace properties [11]. Examples are security properties such as *non-interference* [21,35]. In this paper, we consider a closely related but more general property called *input data usage*, which expresses that *the outcome of a program does not depend on (some of ) its input data*. The notion of *outcome* accounts for non-determinism as well as non-termination. Thus, our notion of dependency encompasses non-interference as well as notions of dependency that arise in many other contexts [1,9]. We further explore this in Sects. 8 to 10.

Let each program P with trace semantics [[P]] have a set I<sup>P</sup> of input variables and a set O<sup>P</sup> of output variables<sup>1</sup>. For simplicity, we can assume that these variables are all of the same type (e.g., boolean variables) and their values are all in a set V of possible values (e.g., V = {t, f} where t is the boolean value true and f is the boolean value false). Given a trace <sup>σ</sup> <sup>∈</sup> [[P]], we write <sup>σ</sup>[0] to denote its initial state and σ[ω] to denote its outcome, that is, its final state if the trace is finite or ⊥ if the trace is infinite. The input variables at the initial states of the traces of a program store the values of its input data: we write σ[0](i) to denote the value of the input data stored in the input variable i at the initial state of the trace σ, and σ1[0] =<sup>i</sup> σ2[0] to denote that the initial states of two traces σ<sup>1</sup> and σ<sup>2</sup> disagree on the value of the input variable i but agree on the values of all other variables. The output variables at the final states of the finite traces of a program store its result: we write σ[ω](o) to denote the result stored in the output variable o at the final state of a finite trace σ. We can now formally define when an input variable i ∈ I<sup>P</sup> is *unused* with respect to a program with trace semantics [[P]] ∈ P (Σ<sup>+</sup>∞):

$$\begin{aligned} \text{UNUSE}\_i([P]) & \stackrel{\text{def}}{=} \forall \sigma \in [P], v \in \mathcal{V} \colon \sigma[0](i) \neq v \Rightarrow \\ \exists \sigma' \in [P] \colon \sigma'[0] \neq\_i \sigma[0] \land \sigma'[0](i) = v \land \sigma[\omega] = \sigma'[\omega] \end{aligned} \tag{3}$$

Intuitively, an input variable i is unused if all feasible program outcomes (e.g., the outcome σ[ω] of a trace σ) are feasible from all possible initial values of i (i.e., for all possible initial values v of i that differ from the initial value of i in σ, there exists a trace with initial value v for i that has the same outcome σ[ω]). In other words, the outcome of the program is the same independently of

<sup>1</sup> The approach can be easily extended to infinite inputs and/or outputs via abstractions such as the one later presented in Sect. 11.

```
1 english = input ( )
2 math = input ( )
3 science = input ( )
4 bonus = input ( )
5
6 passing = True
7 i f not english : english = False # eng l ish should be passing
8 i f not math : p a s s in g = bonus
9 i f not math : p a s s in g = bonus # math shou ld be s c i e n c e
10
11 print (passing )
```
**Fig. 3.** Simple program to check if a student has passed three school subjects. The programmer has made two mistakes at line 7 and at line 9, which cause the input data stored in the variables english and science to be unused.

the initial value of the input variable i. Note that this definition accounts for non-determinism (since it considers each program outcome independently) and non-termination (since a program outcome can be ⊥).

*Example 1.* Let us consider the simple program P in Fig. 3. Based on the input variables english, math, and science (cf. lines 1–3), the program is supposed to check if a student has passed all three considered school subjects and store the result in the output variable passing (cf. line 11). For mathematics and science, the student is allowed a bonus based on the input variable bonus (cf. line 8 and 9). However, the programmer has made two mistakes at line 7 and at line 9, which cause the input variables english and science to be unused.

Let us now consider the input variable science. The trace semantics of the program (simplified to consider only the variables science and passing) is:

$$\{ [P]\_{\mathsf{reci}\mathsf{inc}\mathsf{cc}} = \{ (\mathsf{T}\_{\mathsf{\bullet}}) \dots (\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\cdots}}}}}}}}}}}}}}}}}}}}}}}} $$
}}}

where each state (v1v2) shows the boolean value <sup>v</sup><sup>1</sup> of science and <sup>v</sup><sup>2</sup> of passing, and denotes any boolean value. We omitted the trace suffixes for brevity. The input variable science is *unused*, since each result value (t or f) for passing is feasible from all possible initial values of science. Note that all other outcomes of the program (i.e., non-termination) are not feasible.

Let us now consider the input variable math. The trace semantics of the program (now simplified to only consider math and passing) is the following:

$$\{ [P]\_{\mathtt{match}} = \{ (\mathtt{T}\_{\mathtt{\bullet}}) \ldots (\mathtt{T} \mathtt{T}), (\mathtt{F}\_{\mathtt{\bullet}}) \ldots (\mathtt{F} \mathtt{T}), (\mathtt{F}\_{\mathtt{\bullet}}) \ldots (\mathtt{F} \mathtt{F}) \} $$

In this case, the input variable math is used since only the initial state (<sup>f</sup> ) yields the result value f for passing (in the final state (ff)). -

The input data usage property N can now be formally defined as follows:

$$\mathcal{N} \stackrel{\text{def}}{=} \left\{ \left[ P \right] \in \mathcal{P} \left( \Sigma^{+\infty} \right) \mid \forall i \in \mathbf{I}\_P \colon \text{UNUSE}\_i(\left[ P \right]) \right\} \tag{4}$$

which states that the outcome of a program does not depend on *any* input data. In practice one is interested in weaker input data usage properties for a subset J of the input variables, i.e., N<sup>J</sup> def <sup>=</sup> {[[P]] ∈ P (Σ<sup>+</sup>∞) | ∀<sup>i</sup> <sup>∈</sup> <sup>J</sup> <sup>⊆</sup> <sup>I</sup><sup>P</sup> : unused<sup>i</sup> ([[P]])}.

In the following, we use abstract interpretation to reason about input data usage. In the next section, we discuss the challenges to the application of the standard abstract interpretation framework that emerge from the fact that input data usage cannot be expressed as a trace property.

### **4 Sound Input Data Usage Validation**

In the standard framework of abstract interpretation, one defines a semantics that precisely captures a property S of interest by abstraction of the trace semantics Λ [12]. Then, further abstractions Λ provide sound over-approximations <sup>γ</sup>(Λ) of <sup>Λ</sup> (by means of a concretization function <sup>γ</sup>): <sup>Λ</sup> <sup>⊆</sup> <sup>γ</sup>(Λ). For a *trace property*, an over-approximation γ([[P]]) of the semantics [[P]] of a program P allows a sound validation of the property: since [[P]] <sup>⊆</sup> <sup>γ</sup>([[P]]), we have that <sup>γ</sup>([[P]]) ⊆S⇒ [[P]] ⊆ S and so, if <sup>γ</sup>([[P]]) ⊆ S, we can conclude that <sup>P</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> (cf. Sect. 3). This conclusion is also valid for all other *subset-closed* properties [11]: since by definition <sup>γ</sup>([[P]]) ∈S⇒∀<sup>T</sup> <sup>⊆</sup> <sup>γ</sup>([[P]]): <sup>T</sup> ∈ S, we have that <sup>γ</sup>([[P]]) ∈S⇒ [[P]] ∈ S (and so we can conclude that <sup>P</sup> <sup>|</sup><sup>=</sup> <sup>S</sup>). However, for program properties that are not subset-closed, we have that <sup>γ</sup>([[P]]) ∈ S ⇒ [[P]] ∈ S [28] and so we cannot conclude that <sup>P</sup> <sup>|</sup><sup>=</sup> <sup>S</sup>, even if <sup>γ</sup>([[P]]) ∈ S (cf. Eq. 2).

We have seen in the previous section that input data usage is not a trace property. The example below shows that it is *not* a subset-closed property either.

*Example 2.* Let us consider again the program <sup>P</sup> and its semantics [[P]]science and [[P]]math shown in Example 1. We have seen in Example <sup>1</sup> that the semantics [[P]]science belongs to the data usage property <sup>N</sup> : [[P]]science ∈ N . Let us consider now the following subset <sup>T</sup> of [[P]]science:

$$T = \{ (\mathtt{T}.\mathtt{2})\ldots\mathtt{(FT)}, (\mathtt{F}.\mathtt{2})\ldots\mathtt{(FT)}, (\mathtt{F}.\mathtt{2})\ldots\mathtt{(FF)} \} $$

In this case, the input variable science is used. Indeed, we can observe that <sup>T</sup> coincides with [[P]]math (except for the considered input variable). Thus <sup>T</sup> ∈ N even though <sup>T</sup> <sup>⊆</sup> [[P]]science. -

Since input data usage is not subset-closed, we are in the unfortunate situation that we cannot use the standard abstract interpretation framework to soundly prove that a program does not use (some of) its input data using an over-approximation of the semantics of the program: <sup>γ</sup>([[P]]) ∈ N<sup>J</sup> ⇒ [[P]] ∈ N<sup>J</sup> .

We solve this problem in the next section, by lifting the trace semantics [[P]] ∈ P (Σ<sup>+</sup>∞) of a program <sup>P</sup> (i.e., a set of traces) to a set of sets of traces -<sup>P</sup> ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) [28]. In this setting, a program <sup>P</sup> satisfies a property <sup>H</sup> if and only if its semantics -P is a subset of H:

$$P \vdash \mathcal{H} \Leftrightarrow \{ P \} \subseteq \mathcal{H} \tag{5}$$

As we will explain in the next section, now an over-approximation γ(-P) of -P allows again a sound validation of the property: since -P ⊆ γ(-P), we have that γ(-<sup>P</sup>) ⊆H⇒ -P ⊆ H (and so we can conclude that P |= H).

More specifically, in the next section, we define a program semantics -P that precisely captures which subset J of the input variables is unused by a program P. In later sections, we present further abstractions -P that over-approximate the subset of the input variables that *may be used* by P, and thus allows a sound validation of an *under-approximation* J of J: γ(-<sup>P</sup>) ⊆ NJ- ⇒ -P ⊆ NJ- . In other words, this means that every input variable reported as unused by an abstraction is indeed not used by the program.

### **5 Outcome Semantics**

We lift the trace semantics Λ to a set of sets of traces by *partitioning*. The *partitioning abstraction* <sup>α</sup><sup>Q</sup> : <sup>P</sup> (Σ<sup>+</sup>∞) → P (<sup>P</sup> (Σ<sup>+</sup>∞)) of a set of traces <sup>T</sup> is:

$$
\alpha\_Q(T) \stackrel{\text{def}}{=} \{ T \cap C \mid C \in Q \} \tag{6}
$$

where <sup>Q</sup> ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) is a *partition* of sequences of program states.

More specifically, to reason about input data usage of a program P, we lift the trace semantics [[P]] to -P by partitioning it into sets of traces that yield the same program outcome. The key insight behind this idea is that, given an input variable i, the initial states of all traces in a partition give all initial values for i that yield a program outcome; the variable i is unused if and only if these initial values are all the possible values for i (or the set of values is empty because the outcome is unfeasible, cf. Eq. 3). Thus, if the trace semantics [[P]] of a program P belongs to the input data usage property N<sup>J</sup> , then each partition in -P must also belong to N<sup>J</sup> , and vice versa: we have that [[P]] ∈ N<sup>J</sup> ⇔ -P ⊆ N<sup>J</sup> , which is precisely what we want (cf. Eq. 5).

Let T <sup>+</sup> <sup>o</sup>=<sup>v</sup> denote the subset of the finite sequences of program states in T ∈ <sup>P</sup> (Σ<sup>+</sup>∞) with value <sup>v</sup> for the output variable <sup>o</sup> in their outcome (i.e., their final state): T <sup>+</sup> o=v def <sup>=</sup> {<sup>σ</sup> <sup>∈</sup> <sup>T</sup> <sup>+</sup> <sup>|</sup> <sup>σ</sup>[ω](o) = <sup>v</sup>}. We define the *outcome partition* <sup>O</sup> ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) of sequences of program states:

$$O \stackrel{\text{def}}{=} \left\{ \Sigma^+\_{o\_1=v\_1,\ldots,o\_k=v\_k} \mid v\_1,\ldots,v\_k \in \mathcal{V} \right\} \cup \left\{ \Sigma^\omega \right\}.$$

where V is the set of possible values of the output variables o1,...,o<sup>k</sup> (cf. Sect. 3). The partition contains all sets of finite sequences that agree on the values of the output variables in their outcome, and all infinite sequences of program states (i.e., all sequences with outcome ⊥). We instantiate α<sup>Q</sup> above with the outcome partition to obtain the *outcome abstraction* <sup>α</sup>• : <sup>P</sup> (Σ<sup>+</sup>∞) → P (<sup>P</sup> (Σ<sup>+</sup>∞)):

$$\alpha\_{\bullet}(T) \stackrel{\text{def}}{=} \left\{ T^{+}\_{o\_1=v\_1,\ldots,o\_k=v\_k} \mid v\_1,\ldots,v\_k \in \mathcal{V} \right\} \cup \left\{ T^{\omega} \right\} \tag{7}$$

*Example 3.* The program <sup>P</sup> of Example <sup>1</sup> has only one output variable passing with boolean value <sup>t</sup> or <sup>f</sup>. Let us consider again the trace semantics [[P]]math shown in Example 1. Its outcome abstraction <sup>α</sup>•([[P]]math) is:

$$\alpha\_{\bullet}([P]\_{\mathtt{match}}) = \{ \emptyset, \{ (\mathtt{F}\_{\bullet}) \ldots (\mathtt{F}\_{\bullet}) \}, \{ (\mathtt{T}\_{\bullet}) \ldots (\mathtt{T} \mathtt{T}), (\mathtt{F}\_{\bullet}) \ldots (\mathtt{F} \mathtt{T}) \} \}$$

Note that all traces with different result values for the output variable passing belong to different sets of traces (i.e., partitions) in <sup>α</sup>•([[P]]math). The empty set corresponds to the (unfeasible) non-terminating outcome of the program. -

We can now use the outcome abstraction α• to define the *outcome semantics* <sup>Λ</sup>• ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) as an abstraction of the trace semantics <sup>Λ</sup>:

**Definition 1.** *The* outcome semantics <sup>Λ</sup>• ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) *is defined as:*

$$A\_{\bullet} \stackrel{\text{def}}{=} \alpha\_{\bullet}(A) \tag{8}$$

*where* <sup>α</sup>• *is the outcome abstraction (cf. Eq. 7) and* <sup>Λ</sup> ∈ P (Σ<sup>+</sup>∞) *is the trace semantics (cf. Eq. 1).*

The outcome semantics contains the set of all infinite traces and all sets of finite traces that agree on the value of the output variables in their outcome.

In the following, we express the outcome semantics Λ• in a constructive fixpoint form. This allows us to later derive further abstractions of Λ• by *fixpoint transfer* and *fixpoint approximation* [12]. Given a set of sets of traces S, we write S<sup>+</sup> o=v def <sup>=</sup> {<sup>T</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>T</sup> <sup>=</sup> <sup>T</sup> <sup>+</sup> <sup>o</sup>=v} for the selection of the sets of traces in S that agree on the value v of the output variable o in their outcome, and S<sup>ω</sup> def = {<sup>T</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>T</sup> <sup>=</sup> <sup>T</sup> <sup>ω</sup>} for the selection of the sets of infinite traces in <sup>S</sup>. When <sup>S</sup><sup>+</sup> o=v (resp. S<sup>ω</sup>) contains a single set of traces T, we abuse notation and write S<sup>+</sup> o=v (resp. S<sup>ω</sup>) to also denote T. The following result gives a fixpoint definition of Λ• in the complete lattice -<sup>P</sup> (<sup>P</sup> (Σ<sup>+</sup>∞)), ·,·,·, {Σ<sup>ω</sup>, ∅} , {∅, Σ<sup>+</sup>}, where the computational order · is defined (similarly to , cf. Sect. 2) as:

$$S\_1 \subseteq S\_2 \stackrel{\text{def}}{=} \bigwedge\_{v\_1, \dots, v\_k \in \mathcal{V}} S\_{1\_{o\_1} = v\_1, \dots, o\_k = v\_k} \subseteq S\_{2\_{o\_1} = v\_1, \dots, o\_k = v\_k}^{+} \land S\_1^{\omega} \supseteq S\_2^{\omega}$$

**Theorem 1.** *The outcome semantics* <sup>Λ</sup>• ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) *can be expressed as a least fixpoint in* -<sup>P</sup> (<sup>P</sup> (Σ<sup>+</sup>∞)), ·,·,·, {Σ<sup>ω</sup>, ∅} , {∅, Σ<sup>+</sup>} *as:*

$$\begin{aligned} A\_{\bullet} &= \text{lfp}^{\Xi} \, \Theta\_{\bullet} \\ \Theta\_{\bullet}(S) &\stackrel{\text{def}}{=} \{ \Omega\_{o\_1 = v\_1, \dots, o\_k = v\_k} \mid v\_1, \dots, v\_k \in \mathcal{V} \} \cup \{ \tau \; ; T \mid T \in S \} \end{aligned} \tag{9}$$

*where* <sup>S</sup><sup>1</sup> ∪· <sup>S</sup><sup>2</sup> def = S1 + <sup>o</sup>1=v1,...,ok=v<sup>k</sup> ∪ S<sup>2</sup> + <sup>o</sup>1=v1,...,ok=v<sup>k</sup> | v1,...,v<sup>k</sup> ∈ V <sup>∪</sup>S<sup>ω</sup> <sup>1</sup> <sup>∪</sup>S<sup>ω</sup> 2 *.*

Figure 4 illustrates the first fixpoint iterates of the outcome semantics for a single output variable o. The fixpoint iteration starts from the partition containing the set of all infinite sequences of program states and the empty set (which

$$\begin{aligned} S\_0 &= \left\{ \left\{ \gamma \widetilde{\smile} \curvearrowright \right\}, \emptyset \right\} \\ S\_1 &= \left\{ \left\{ \begin{array}{c} \Omega\_{\bullet \cdot v} \\ \bullet \\ \end{array} \right\} \; \middle| \; v \in \mathcal{V} \right\} \cup \left\{ \left\{ \xright \xrightarrow{\tau} \xrightarrow{\widetilde{\Sigma^{\omega}}} \gamma \right\} \right\} \\ S\_2 &= \left\{ \left\{ \begin{array}{c} \Omega\_{\bullet \cdot v} \\ \bullet \\ \end{array} \right\} \cup \left\{ \begin{array}{c} \tau \cdot \Omega\_{\bullet \cdot v} \\ \bullet \\ \end{array} \right\} \; \middle| \; v \in \mathcal{V} \right\} \cup \left\{ \left\{ \xright \xrightarrow{\tau} \xrightarrow{\tau} \xrightarrow{\tau} \widetilde{\smile} \curvearrowright \right\} \end{aligned} \right\} \end{aligned}$$

**Fig. 4.** First iterates of the outcome semantics <sup>Λ</sup>*•* for a single output variable <sup>o</sup>.

represents an empty set of finite traces). At the first iteration, the empty set is replaced with a partition of the final states Ω based on the value v of the output variable o, while the infinite sequences are extended by prepending transitions to them (similarly to the trace semantics, cf. Eq. 1). At the next iterations, all sequences contained in each partition are further extended, and the final states that agree on the value v of o are again added to the matching set of traces that agree on v in their outcome. At the limit, we obtain a partition containing the set of all infinite traces and all sets of finite traces that agree on the value v of the output variable o in their outcome.

To prove Theorem 1 we first need to show that the outcome abstraction α• preserves least upper bounds of non-empty sets of sets of traces.

### **Lemma 1.** *The outcome abstraction* α• *is Scott-continuous.*

*Proof.* We need to show that for any non-empty ascending chain C of sets of traces with least upper bound <sup>C</sup>, we have that <sup>α</sup>•(<sup>C</sup>) = · {α•(T) <sup>|</sup> <sup>T</sup> <sup>∈</sup> <sup>C</sup>}, that is, α•(C) is the least upper bound of α•(C), the image of C via α•.

First, we know that α• is monotonic, i.e., for any two sets of traces T<sup>1</sup> and <sup>T</sup><sup>2</sup> we have <sup>T</sup><sup>1</sup> <sup>T</sup><sup>2</sup> <sup>⇒</sup> <sup>α</sup>•(T1) · <sup>α</sup>•(T2). Since <sup>C</sup> is the least upper bound of C, for any set T in C we have that T C and, since α• is monotonic, we have that <sup>α</sup>•(T) · <sup>α</sup>•(<sup>C</sup>). Thus <sup>α</sup>(<sup>C</sup>) is an upper bound of {α•(T) <sup>|</sup> <sup>T</sup> <sup>∈</sup> <sup>C</sup>}.

To show that α(C) is the least upper bound of α•(C), we need to show that for any other upper bound <sup>U</sup> of <sup>α</sup>•(C) we have <sup>α</sup>•(<sup>C</sup>) · <sup>U</sup>. Let us assume by absurd that <sup>α</sup>•(<sup>C</sup>) · <sup>U</sup>. Then, there exists <sup>T</sup><sup>1</sup> <sup>∈</sup> <sup>α</sup>•(<sup>C</sup>) and <sup>T</sup><sup>2</sup> <sup>∈</sup> <sup>U</sup> such that <sup>T</sup><sup>1</sup> <sup>T</sup>2: <sup>T</sup> <sup>+</sup> <sup>1</sup> <sup>⊃</sup> <sup>T</sup> <sup>+</sup> <sup>2</sup> or T <sup>ω</sup> <sup>1</sup> <sup>⊂</sup> <sup>T</sup> <sup>ω</sup> <sup>2</sup> . Let us assume that T <sup>+</sup> <sup>1</sup> <sup>⊃</sup> <sup>T</sup> <sup>+</sup> <sup>2</sup> . By definition of <sup>α</sup>•, we observe that <sup>T</sup><sup>1</sup> is a partition of ·<sup>C</sup> and, since ·<sup>C</sup> is the least upper bound of C, U cannot be an upper bound of α•(C) (since T<sup>2</sup> does not contain enough finite traces). Similarly, if T <sup>ω</sup> <sup>1</sup> <sup>⊂</sup> <sup>T</sup> <sup>ω</sup> <sup>2</sup> , then U cannot be an upper bound of α•(C) (since <sup>T</sup><sup>2</sup> contains too many infinite traces). Thus, we must have <sup>α</sup>•(<sup>C</sup>) · <sup>U</sup> and we can conclude that α(C) is the least upper bound of α•(C). 

We can now prove Theorem 1 by Kleenian fixpoint transfer [12].

*Proof (Sketch).* The proof follows by Kleenian fixpoint transfer. We have that -<sup>P</sup> (<sup>P</sup> (Σ<sup>+</sup>∞)), ·,·,·, {Σω, ∅} , {∅, Σ<sup>+</sup>} is a complete lattice and that <sup>φ</sup><sup>+</sup><sup>∞</sup> (cf. Eq. 1) and Θ• (cf. Eq. 8) are monotonic function. Additionally, we have that the outcome abstraction α• (cf. Eq. 7) is Scott-continuous (cf. Lemma 1) and such that <sup>α</sup>•(Σω) = {Σω, ∅} and <sup>α</sup>• ◦ <sup>φ</sup><sup>+</sup><sup>∞</sup> <sup>=</sup> <sup>Θ</sup>• ◦ <sup>α</sup>•. Then, by Kleenian fixpoint transfer, we have that <sup>α</sup>•(Λ) = <sup>α</sup>•(lfp <sup>φ</sup><sup>+</sup>∞) = lfp· <sup>Θ</sup>•. Thus, we can conclude that <sup>Λ</sup>• = lfp· <sup>Θ</sup>•. 

Finally, we show that the outcome semantics Λ• is sound and complete for proving that a program does not use (a subset of) its input variables.

**Theorem 2.** *A program does not use a subset* J *of its input variables if and only if its outcome semantics* Λ• *is a subset of* N<sup>J</sup> *:*

$$P \vdash \mathcal{N}\_J \Leftrightarrow \mathcal{A}\_\bullet \subseteq \mathcal{N}\_J$$

*Proof (Sketch).* The proof follows immediately from the definition of N<sup>J</sup> (cf. Eq. 3 and Sect. 4) and the definition of Λ• (cf. Eq. 8). 

*Example 4.* Let us consider again the program <sup>P</sup> and its semantics [[P]]science shown in Example 1. The corresponding outcome semantics <sup>α</sup>•([[P]]science) is:

$$\alpha\_{\bullet}([P]\_{\mathtt{scienoc}}) = \{ \emptyset, \{ (\mathtt{r}\_{\bullet}) \ldots (\mathtt{TF}), (\mathtt{F}\_{\bullet}) \ldots (\mathtt{FF}) \}, \{ (\mathtt{r}\_{\bullet}) \ldots (\mathtt{TF}), (\mathtt{F}\_{\bullet}) \ldots (\mathtt{FT}) \} \}$$

Note that all sets of traces in <sup>α</sup>•([[P]]science) belong to <sup>N</sup>{science}: the initial states of all traces in a non-empty partition contain all possible initial values (t or <sup>f</sup>) for the input variable science. Thus, <sup>P</sup> satisfies <sup>N</sup>{science} and, indeed, the input variable science is unused by <sup>P</sup>. -

As discussed in Sect. 4, we now can again use the standard framework of abstract interpretation to soundly over-approximate Λ• and prove that a program does not use (some of) its input data. In the next section, we propose an abstraction that remains sound and complete for input data usage. Further sound but not complete abstractions are presented in later sections.

### **6 Dependency Semantics**

We observe that, to reason about input data usage, it is not necessary to consider all intermediate state computations between the initial state of a trace and its outcome. Thus, we can further abstract the outcome semantics Λ• into a set Λ- of (dependency) relations between initial states and outcomes of a set of traces.

We lift the abstraction defined for this purpose on sets of traces [12] to α-: <sup>P</sup> (<sup>P</sup> (Σ<sup>+</sup>∞)) → P (<sup>P</sup> (<sup>Σ</sup> <sup>×</sup> <sup>Σ</sup>⊥)) on sets of sets of traces:

$$\alpha\_{\cdots}(S) \stackrel{\text{def}}{=} \{ \{ \langle \sigma[0], \sigma[\omega] \rangle \in \Sigma \times \Sigma\_{\perp} \mid \sigma \in T \} \mid T \in S \} \tag{10}$$

where Σ<sup>⊥</sup> def = Σ ∪ {⊥}. The *dependency abstraction* α ignores all intermediate states between the initial state σ[0] and the outcome σ[ω] of all traces σ in all partitions T of S. Observe that a trace σ that consists of a single state s is abstracted as a pair s, s. The corresponding dependency concretization function γ- : <sup>P</sup> (<sup>P</sup> (<sup>Σ</sup> <sup>×</sup> <sup>Σ</sup>⊥)) → P (<sup>P</sup> (Σ<sup>+</sup>∞)) over-approximates the original sets of traces by inserting arbitrary intermediate states:

$$\gamma\_{\neg}(S) \stackrel{\text{def}}{=} \left\{ T \in \mathcal{P}\left(\Sigma^{+\infty}\right) \mid \left\{ \langle \sigma[0], \sigma[\omega] \rangle \in \Sigma \times \Sigma\_{\perp} \mid \sigma \in T \right\} \in S \right\} \tag{11}$$

*Example 5.* Let us consider again the program of Example 1 and its outcome semantics <sup>α</sup>•([[P]]math) shown in Example 3. Its dependency abstraction is:

$$\alpha\_{\leadsto}(\alpha\_{\bullet}([P]\_{\mathfrak{match}})) = \{ \emptyset, \{ \langle \text{F\\_} \pi, \text{F\\_} \rangle \}, \{ \langle \text{T\\_} \pi, \text{TT} \rangle, \langle \text{F\\_} \pi, \text{FT} \rangle \} \}$$

which explicitly ignores intermediate program states. -

Using α-, we now define the *dependency semantics* Λ- ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) as an abstraction of the outcome semantics Λ•.

**Definition 2.** *The* dependency semantics Λ-∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) *is defined as:*

$$\Lambda\_{\leadsto} \stackrel{\text{def}}{=} \alpha\_{\leadsto}(\Lambda\_{\bullet}) \tag{12}$$

*where* <sup>Λ</sup>• ∈ P (<sup>P</sup> (Σ<sup>+</sup>∞)) *is the outcome semantics (cf. Eq. 8) and* <sup>α</sup> *is the dependency abstraction (cf. Eq. 10).*

Neither the Kleenian fixpoint transfer nor the Tarskian fixpoint transfer can be used to obtain a fixpoint definition for the dependency semantics, but we have to proceed by union of disjoint fixpoints [12]. To this end, we observe that the outcome semantics Λ• can be equivalently expressed as follows:

$$\begin{aligned} A\_{\bullet} &= A\_{\bullet}^{+} \cup A\_{\bullet}^{\omega} = \operatorname{lfp}\_{\overline{\emptyset}}^{\overline{\mathsf{T}}} \, \Theta\_{\bullet}^{+} \cup \operatorname{lfp}\_{\{\overline{\{\Sigma^{\omega}\}}\}}^{\overline{\mathsf{T}}} \, \Theta\_{\bullet}^{\omega} \\ \Theta\_{\bullet}^{+}(S) &\overset{\text{def}}{=} \{ \Omega\_{o\_{1} = v\_{1}, \dots, o\_{k} = v\_{k}} \mid v\_{1}, \dots, v\_{k} \in \operatorname{V} \} \cup \{ \tau \; ; T \mid T \in S \} \\ \Theta\_{\bullet}^{\omega}(S) &\overset{\text{def}}{=} \{ \tau \; ; T \mid T \in S \} \end{aligned} \tag{13}$$

where Λ<sup>+</sup> • and <sup>Λ</sup><sup>ω</sup> • separately compute the set of all sets of finite traces that agree on their outcome, and the set of all infinite traces, respectively.

In the following, given a set of traces <sup>T</sup> ∈ P (Σ<sup>+</sup>∞) and its dependency abstraction α-(T), we abuse notation and write T <sup>+</sup> (resp. T <sup>ω</sup>) to also denote α-(T)<sup>+</sup> def = α-(T) ∩ (Σ × Σ) (resp. α-(T)<sup>ω</sup> def = α-(T) ∩ (Σ × {⊥})). Similarly, we reuse the symbols for the computational order ·, least upper bound ·, and greatest lower bound ·, instead of their abstractions. We can now use the Kleenian and Tarskian fixpoint transfer to separately derive fixpoint definitions of α-(Λ<sup>+</sup> • ) and α-(Λ<sup>ω</sup> • ) in -<sup>P</sup> (<sup>P</sup> (<sup>Σ</sup> <sup>×</sup> <sup>Σ</sup>⊥)), ·,·,·, {<sup>Σ</sup> × {⊥} , ∅} , {∅, Σ <sup>×</sup> <sup>Σ</sup>}.

**Lemma 2.** *The abstraction* Λ<sup>+</sup> - def = α-(Λ<sup>+</sup> • ) ∈ P (P (Σ × Σ)) *can be expressed as a least fixpoint in* -<sup>P</sup> (<sup>P</sup> (<sup>Σ</sup> <sup>×</sup> <sup>Σ</sup>⊥)), ·,·,·, {<sup>Σ</sup> × {⊥} , ∅} , {∅, Σ <sup>×</sup> <sup>Σ</sup>} *as:*

$$\begin{aligned} \Lambda\_{\leadsto}^{+} &= \operatorname{lfp}\_{\{\emptyset\}}^{\boxplus} \Theta\_{\leadsto}^{+} \\ \Theta\_{\leadsto}^{+}(S) &\stackrel{\text{def}}{=} \{ \Omega\_{o\_{1} = v\_{1}, \dots, o\_{k} = v\_{k}} \times \Omega\_{o\_{1} = v\_{1}, \dots, o\_{k} = v\_{k}} \mid v\_{1}, \dots, v\_{k} \in \mathcal{V} \} \cup \{ \tau \circ R \mid R \in S \} \end{aligned} \tag{14}$$

*Proof (Sketch).* By Kleenian fixpoint transfer (cf. Theorem 17 in [12]). 

**Lemma 3.** *The abstraction* Λ<sup>ω</sup> - def = α-(Λ<sup>ω</sup> • ) ∈ P (P (Σ × Σ)) *can be expressed as a least fixpoint in* -<sup>P</sup> (<sup>P</sup> (<sup>Σ</sup> <sup>×</sup> <sup>Σ</sup>⊥)), ·,·,·, {<sup>Σ</sup> × {⊥} , ∅} , {∅, Σ <sup>×</sup> <sup>Σ</sup>} *as:*

$$\begin{aligned} A^{\omega}\_{\leadsto} &= \text{lfp}^{\varXi}\_{\{\varSigma \times \{\bot\}\}} \quad \Theta^{\omega}\_{\leadsto} \\ \Theta^{\omega}\_{\leadsto}(S) &\stackrel{\text{def}}{=} \{\tau \circ R \mid R \in S\} \end{aligned} \tag{15}$$

*Proof (Sketch).* By Tarskian fixpoint transfer (cf. Theorem 18 in [12]). 

The fixpoint iteration for Λ<sup>+</sup> starts from the set containing only the empty relation. At the first iteration, the empty relation is replaced by all relations between pairs of final states that agree on the values of the output variables. At each next iteration, all relations are combined with the transition relation to obtain relations between initial and final states of increasingly longer traces. At the limit, we obtain the set of all relations between the initial and the final states of a program that agree on the final value of the output variables. The fixpoint iteration for Λ<sup>ω</sup> starts from the set containing (the set of) all pairs of states and the ⊥ outcome, and each iteration discards more and more pairs with initial states that do not belong to infinite traces of the program.

Now we can use Lemmas 2 and 3 to express the dependency semantics Λ- in a constructive fixpoint form (as the union of Λ<sup>+</sup> and Λ<sup>ω</sup> -).

**Theorem 3.** *The dependency semantics* Λ- ∈ P (P (Σ × Σ⊥)) *can be expressed as a least fixpoint in* -<sup>P</sup> (<sup>P</sup> (<sup>Σ</sup> <sup>×</sup> <sup>Σ</sup>⊥)), ·,·,·, {<sup>Σ</sup> × {⊥} , ∅} , {∅, Σ <sup>×</sup> <sup>Σ</sup>} *as:*

$$\begin{aligned} \Lambda\_{\leadsto} &= \Lambda\_{\leadsto}^{+} \cup \Lambda\_{\leadsto}^{\omega} = \text{lfp}^{\square}\_{\{\Sigma \times \{\bot\}, \emptyset\}} \lrcorner \Theta\_{\leadsto} \\ \Theta\_{\leadsto}(S) &\stackrel{\text{def}}{=} \{ \Omega\_{o\_1 = v\_1, \dots, o\_k = v\_k} \times \Omega\_{o\_1 = v\_1, \dots, o\_k = v\_k} \mid v\_1, \dots, v\_k \in \mathcal{V} \} \cup \{ \tau \circ R \mid R \in S \} \end{aligned} \tag{16}$$

*Proof (Sketch).* The proof follows immediately from Lemmas 2 and 3. 

Finally, we show that the dependency semantics Λ is sound and complete for proving that a program does not use (a subset of) its input variables.

**Theorem 4.** *A program does not use a subset* J *of its input variables if and only if the image via* γ *of its dependency semantics* Λ*is a subset of* N<sup>J</sup> *:*

$$P \mid = \mathcal{N}\_J \Leftrightarrow \gamma\_{\leadsto}(\Lambda\_{\leadsto}) \subseteq \mathcal{N}\_J$$

*Proof (Sketch).* The proof follows from the definition of Λ- (cf. Eq. 12) and γ- (cf. Eq. 11), and from Theorem 2. 

*Example 6.* Let us consider again the program P and its outcome semantics <sup>α</sup>•([[P]]science) from Example 4. The corresponding dependency semantics is:

$$\alpha\_{\leadsto}(\alpha\_{\bullet}([P]\_{\texttt{scienco}})) = \{ \emptyset, \{ \langle \texttt{T}, \texttt{TF} \rangle, \langle \texttt{F}, \texttt{FF} \rangle \}, \{ \langle \texttt{T}, \texttt{TF} \rangle, \langle \texttt{F}, \texttt{FT} \rangle \} \}$$

and, by definition of γ-, we have that its concretization γ-(α-(α•([[P]]science))) is an over-approximation of <sup>α</sup>•([[P]]science). In particular, since intermediate state computations are irrelevant for deciding the input data usage property, all sets of traces in γ-(α-(α•([[P]]science))) are over-approximations of exactly one set in <sup>α</sup>•([[P]]science) with the same set of initial states and outcome. Thus, in this case, we can observe that all sets of traces in γ-(α-(α•([[P]]science))) belong to <sup>N</sup>{science} and correctly conclude that <sup>P</sup> does not use the variable science. -

At this point we have a sound and complete program semantics that captures only the minimal information needed to decide which input variables are unused by a program. In the rest of the paper, we present various static analyses for input data usage by means of sound abstractions of this semantics, which *underapproximate* (resp. over-approximate) the subset of the input variables that are *unused* (resp. used) by a program.

### **7 Input Data Usage Abstractions**

We introduce a simple sequential programming language with boolean variables, which we use for illustration throughout the rest of the paper:

$$\begin{array}{l} \text{e ::=} v \mid x \mid \text{not } e \mid e \text{ and } e \mid e \text{ or } e\\ \text{ $s$  ::=  $s$ } \mathbf{skip} \mid x = e \mid \text{if } e \colon s \text{ `else} \colon s \mid \text{while } e \colon s \mid s \text{ ` } \end{array} \text{ (expression)}$$

where v ranges over boolean values, and x ranges over program variables. The skip statement, which does nothing, is a placeholder useful, for instance, for writing a conditional if statement without an else branch: if <sup>e</sup>: <sup>s</sup> else: skip. In the following, we often simply write if <sup>e</sup>: <sup>s</sup> instead of if <sup>e</sup>: <sup>s</sup> else: skip. Note that our work is not limited by the choice of a particular programming language, as the formal treatment in previous sections is language independent.

In Sects. 8 and 9, we show that existing static analyses based on dependencies [6,20] are abstractions of the dependency semantics Λ-. We define each abstraction <sup>Λ</sup> over a partially ordered set -A, <sup>A</sup> called *abstract domain*. More specifically, for each program statement <sup>s</sup>, we define a *transfer function* <sup>Θ</sup>[[s]]: A→A, and the abstraction Λ is the composition of the transfer functions of all statements in a program. We derive a more precise static analysis similar to dependency analyses used for program slicing [37] in Sect. 10. Finally, Sect. 11 demonstrates the value of expressing such analyses as abstract domains by combining them with an existing abstraction of compound data structures such as arrays and lists [16] to detect unused chunks of input data.

### **8 Secure Information Flow Abstractions**

Secure information flow analysis [18] aims at proving that a program will not leak sensitive information. Most analyses focus on proving *non-interference* [35] by classifying program variables into different security levels [17], and ensuring the absence of information flow from variables with higher security level to variables with lower security level. The most basic classification comprises a low security level L, and a high security level H: program variables classified as L are public information, while variables classified as H are private information.

In our context, if we classify input variables as H and all other variables as L, possiblistic non-interference [21] coincides with the input data usage property N (cf. Eq. 4) *restricted to consider only terminating programs*. However, in general, (possibilistic) non-interference is too strong for our purposes as it requires that *none* of the input variables is used by a program. We illustrate this using as an example a non-interference analysis recently proposed by Assaf et al. [6] that is conveniently formalized in the framework of abstract interpretation. We briefly present here a version of the originally proposed analysis, simplified to consider only the security levels L and H, and we point out the significance of the definitions for input data usage.

Let <sup>L</sup> def = {L, H} be the set of security levels, and let the set X of all program variables be partitioned into a set X<sup>L</sup> of variables classified as L and a set X<sup>H</sup> of variables classified as H (i.e., the input variables). A dependency constraint L x expresses that the current value of the variable x depends only on the initial values of variables having at most security level L (i.e., it does not depend on the initial value of any of the input variables). The non-interference analysis Λ<sup>F</sup> proposed by Assaf et al. is a *forward analysis* in the lattice -P (F), <sup>F</sup>,<sup>F</sup> where F def = {L x | x ∈ X} is the set of all dependency constraints, S<sup>1</sup> <sup>F</sup> S2 def = S<sup>1</sup> ⊇ S2, and S<sup>1</sup> <sup>F</sup> S<sup>2</sup> def = S<sup>1</sup> ∩ S2. The transfer function ΘF[[s]]: P (F) → P (F) for each statement s in our simple programming language is defined as follows:

$$\begin{split} \Theta\_{\mathrm{F}}[\mathsf{skip}](S) & \stackrel{\mathrm{def}}{=} S \\ \Theta\_{\mathrm{F}}[x = e](S) & \stackrel{\mathrm{def}}{=} \{L \rightsquigarrow y \in S \mid y \neq x\} \cup \{L \rightsquigarrow x \mid \mathsf{V}\_{\mathrm{F}}[e]S\} \\ \Theta\_{\mathrm{F}}[\mathsf{if}\; e\;\, s\_{1}\;\mathsf{else}\;\,s\_{2}](S) & \stackrel{\mathrm{def}}{=} \begin{cases} \Theta\_{\mathrm{F}}[s\_{1}](S)\bot\_{\mathrm{F}}\Theta\_{\mathrm{F}}[s\_{2}](S) & \text{if } \mathsf{V}\_{\mathrm{F}}[e]S \\ \{L \rightsquigarrow x \in S \mid x \notin \mathsf{w}(s\_{1}) \cup \mathsf{w}(s\_{2})\} & \text{otherwise} \end{cases} \\ \Theta\_{\mathrm{F}}[\mathsf{while1}\; e\;\,s\;\,s](S) & \stackrel{\mathrm{def}}{=} \mathsf{lift}\_{S}^{\mathrm{exp}} \; \Theta\_{\mathrm{F}}[\mathsf{if}\;\,e\;\,s\;\,\mathsf{else}\;\,\mathsf{skip}] \\ \Theta\_{\mathrm{F}}[s\_{1}\;s\_{2}](S) & \stackrel{\mathrm{def}}{=} \Theta\_{\mathrm{F}}[s\_{2}] \bullet \Theta\_{\mathrm{F}}[s\_{1}](S) \end{split}$$

where w(s) denotes the set of variables modified by the statement <sup>s</sup>, and <sup>V</sup>F[[e]]<sup>S</sup> determines whether a set of dependencies S guarantees that the expression e has a unique value independently of the initial value of the input variables. For a variable x, VF[[x]]S is true if and only if L x ∈ S. Otherwise, VF[[e]]S is defined recursively on the structure of e, and it is always true for a boolean value v [6]. An assignment x = e discards all dependency constraints related to the assigned variable x, and adds constraints L x if e has a unique value independently of the initial values of the input variables. This captures an *explicit flow* of information between <sup>e</sup> and <sup>x</sup>. A conditional statement if <sup>e</sup>: <sup>s</sup><sup>1</sup> else: <sup>s</sup><sup>2</sup> joins the dependency constraints obtained from s<sup>1</sup> and s2, if e does not depend on the initial values of the input variables (i.e., VF[[e]]S is true). Otherwise, it discards all dependency constraints related to the variables modified in either of its branches. This captures an *implicit flow* of information from e. The initial set of dependencies contains a constraint L x for each variable x that is not an input variable. We exemplify the analysis below.

*Example 7.* Let us consider again the program P from Example 1 (stripped of the input and print statements, which are not present in our simple language):


The analysis begins from the set of dependency constraints {<sup>L</sup> passing}, which classifies input variables as H and all other variables as L. The assignment at line 1 leaves the set unchanged as the value of the expression True on the right-hand side of the assignment does not depend on the initial value of the input variables. The set remains unchanged by the conditional statement at line 2, even though the boolean condition depends on the input variable english, because the variable passing is not modified. Finally, at line 3 and 4, the analysis captures an explicit flow of information from the input variable bonus and an implicit flow of information from the input variable math. Thus, the set of dependency constraints becomes empty at line 3, and remains empty at line 4.

Observe that, in this case, non-interference does not hold since the result of the program depends on some of the input variables. Therefore, the analysis is only able to conclude that at least one of the input variables may be used by the program, but it cannot determine which input variables are unused. -

The example shows that non-interference is too strong a property in general. Of course, one could determine which input variables are unused by running multiple instances of the non-interference analysis ΛF, each one of them classifying a single different input variable as H and all other variables as L. However, this becomes cumbersome in a data science application where a program reads and manipulates a large amount of input data.

Moreover, we emphasize that our input data usage property is more general than (possibilistic) non-interference since it also considers non-termination. We are not aware of any work on termination-sensitive possibilistic non-interference.

*Example 8.* Let us modify the program P shown in Example 7 as follows:

```
1 passing = True
2 while not english : english = False
```
In this case, since the loop at line 2 does not modify the output variable passing, the non-interference analysis Λ<sup>F</sup> will leave the initial set of dependency constraints {<sup>L</sup> passing} unchanged, meaning that the result of the program does not depend on any of its input variables. However, the input variable english is used since its value influences the outcome of the program: the program terminates if english is true, and does not terminate otherwise. -

The example demonstrates that the analysis is *unsound* for a non-terminating program.<sup>2</sup> We show that the non-interference analysis Λ<sup>F</sup> is sound for proving that a program does not use any of its input variables, *only if the program is terminating*. We define the concretization function γ<sup>F</sup> : P (F) → P (P (Σ × Σ)):

$$\gamma\_{\mathcal{F}}(S) \stackrel{\text{def}}{=} \{ R \in \mathcal{P} \: (\Sigma \times \Sigma) \mid \alpha\_{\mathcal{F}}(R) \subseteq\_{\mathcal{F}} S \} \tag{17}$$

The abstraction function α<sup>F</sup> : P (P (Σ × Σ)) → P (F) maps each relation R between states of a program to the corresponding set of dependency constraints: αF(R) def <sup>=</sup> {<sup>L</sup> <sup>x</sup> <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>L</sup> ∧ ∀<sup>i</sup> <sup>∈</sup> <sup>X</sup><sup>H</sup> : unusedi,x(R)}, where unusedi,x is the relational abstraction of unused<sup>i</sup> (cf. Eq. 3) in which we compare only the result stored in the variable x (i.e., we compare σ[ω](o) and σ [ω](o), instead of σ[ω] and σ [ω] as in Eq. 3).

**Theorem 5.** *A terminating program does not use any of its input variables if the image via* γ-◦ γ<sup>F</sup> *of its non-interference abstraction* Λ<sup>F</sup> *is a subset of* N *:*

$$\gamma\_{\prec}(\gamma\_{\mathcal{F}}(A\_{\mathcal{F}})) \subseteq \mathcal{N} \Rightarrow P \vdash \mathcal{N}$$

*Proof.* Let us assume that γ-(γF(ΛF)) ⊆ N . By definition of γ<sup>F</sup> (cf. Eq. 17), since the program is terminating, we have that Λ- ⊆ γF(ΛF) and, by monotonicity of the concretization function γ- (cf. Eq. 11), we have that γ-(Λ-) ⊆ γ-(γF(ΛF)). Thus, since γ-(γF(ΛF)) ⊆ N , we have that γ-(Λ-) ⊆ N . The conclusion follows from Theorem 4. 

Note that the termination of the program is necessary for the proof of Theorem 5. Indeed, for a non-terminating program, we have that Λ- ⊆ γF(ΛF) (since Λ includes relational abstractions of infinite traces that are missing from γF(ΛF)) and thus we cannot conclude the proof.

This result shows that the non-interference analysis Λ<sup>F</sup> is an abstraction of the dependency semantics Λ presented earlier. However, we remark that the same result applies to all other instances in this important class of analysis [5,25, etc.], which are therefore subsumed by our framework.

### **9 Strongly Live Variable Abstraction**

Strongly live variable analysis [20] is a variant of the classic live variable analysis [32] performed by compilers to determine, for each program point, which variables may be potentially used before they are assigned to. A variable is *strongly live* if it is used in an assignment to another strongly live variable, or if is used in a statement other than an assignment. Otherwise, a variable is considered *faint*.

<sup>2</sup> The case of a program using an input variable and then always diverging is not problematic because the analysis would be imprecise but still sound.

Strongly live variable analysis Λ<sup>X</sup> is a *backward analysis* in the complete lattice -P (X), ⊆,∪,∩, ∅, X, where X is the set of all program variables. The transfer function ΘX[[s]]: P (X) → P (X) for each statement s is defined as:

$$\begin{aligned} \Theta\_{\mathbf{X}}[\mathtt{skip}](S) & \stackrel{\text{def}}{=} S \\ \Theta\_{\mathbf{X}}[x = e](S) & \stackrel{\text{def}}{=} \begin{cases} (S \mid \{x\}) \cup \mathsf{vARS}(e) & x \in S \\ S & \text{otherwise} \end{cases} \\ \Theta\_{\mathbf{X}}[\mathtt{if} \; b \colon s\_{1} \; \mathtt{else} \colon s\_{2}](S) & \stackrel{\text{def}}{=} \mathsf{vARS}(b) \cup \Theta\_{\mathbf{X}}[s\_{1}](S) \cup \Theta\_{\mathbf{X}}[s\_{2}](S) \\ \Theta\_{\mathbf{X}}[\mathtt{while} \; b \colon s](S) & \stackrel{\text{def}}{=} \mathsf{vARS}(b) \cup \Theta\_{\mathbf{X}}[s](S) \\ \Theta\_{\mathbf{X}}[s\_{1} \; s\_{2}](S) & \stackrel{\text{def}}{=} \Theta\_{\mathbf{X}}[s\_{1}] \circ \Theta\_{\mathbf{X}}[s\_{2}](S) \end{aligned}$$

where vars(e) is the set of variables in the expression <sup>e</sup>. For input data usage, the initial set of strongly live variables contains the output variables of the program.

*Example 9.* Let us consider again the program P shown in Example 7. The strongly live variable analysis begins from the set {passing} containing the output variable passing. At line 3, the set of strongly live variables is {math, bonus} since bonus is used in an assignment to the strongly live variable passing, and math is used in the condition of the if statement. Finally, at line 1, the set of strongly live variables is {english, math, bonus} because english is used in the condition of the if statement at line 2. Thus, strongly live variable analysis is able to conclude that the input variable science is unused. However, it is not precise enough to determine that the variable english is also unused. -

The imprecision of the analysis derives from the fact that it does not capture implicit flows of information precisely (cf. Sect. 8) but only over-approximates their presence. Thus, the analysis is unable to detect when a conditional statement, for instance, modifies only variables that have no impact on the outcome of a program; a situation likely to arise due to a programming error, as shown in the previous example. However, in virtue of this imprecise treatment of implicit flows, we can show that strongly live variable analysis is sound for input data usage, even for non-terminating programs.

We define the concretization function γ<sup>X</sup> : P (X) → P (P (Σ × Σ⊥)) as:

$$\gamma\_{\mathbf{X}}(S) \stackrel{\text{def}}{=} \{ R \in \Sigma \times \Sigma\_{\perp} \mid \forall i \in \mathbf{X} \text{ } \{ S \text{: } \text{Unvuse} \text{ $\mathbf{z}$ }(R) \}\tag{18}$$

where we abuse notation and use unused<sup>i</sup> (cf. Eq. 3) to also denote its dependency abstraction (cf. Eq. 10). We now show that strongly live variable analysis is sound for proving that a program does not use the faint variables.

**Theorem 6.** *A program does not use a subset* J *of its input variables if the image via* γ-◦ γ<sup>X</sup> *of its strongly live variable abstraction* Λ<sup>X</sup> *is a subset of* N<sup>J</sup> *:*

$$\gamma\_{\leadsto}(\gamma\_{\mathbf{X}}(A\_{\mathbf{X}})) \subseteq \mathcal{N}\_J \Rightarrow P \vdash \mathcal{N}\_J$$

*Proof.* Let us assume that γ-(γX(ΛX)) ⊆ N<sup>J</sup> . By definition of γ<sup>X</sup> (cf. Eq. 18), we have that Λ- ⊆ γX(ΛX) and, by monotonicity of γ- (cf. Eq. 11), we have that γ-(Λ-) ⊆ γ-(γX(ΛX)). Thus, since γ-(γX(ΛX)) ⊆ N<sup>J</sup> , we have that γ-(Λ-) ⊆ N<sup>J</sup> . The conclusion follows from Theorem 4. 

This result shows that also strongly live variable analysis is subsumed by our framework as it is an abstraction of the dependency semantics Λ-.

### **10 Syntactic Dependency Abstractions**

In the following, we derive a more precise data usage analysis based on *syntactic* dependencies between program variables. For simplicity, the analysis does not take program termination into account, but we discuss possible solutions at the end of the section. Due to space limitations, we only provide a terse description of the abstraction and refer to [36] for further details.

**Fig. 5.** Hasse diagram for the complete lattice usage, usage, usage, usage,N,U.

In order to capture implicit dependencies from variables appearing in boolean conditions of conditional and while statements, we track when the value of a variable is used or modified in a statement based on the level of nesting of the statement in other statements. More formally, each program variable maps to a value in the complete lattice shown in Fig. 5: the values U (*used*) and N (*notused*) respectively denote that a variable may be used and is not used at the current nesting level; the values B (*below*) and W (*overwritten*) denote that a variable may be used at a lower nesting level, and the value W additionally indicates that the variable is modified at the current nesting level.

A variable is used (i.e., maps to U) if it is used in an assignment to another variable that is used in the current or a lower nesting level (i.e., a variable that maps to <sup>U</sup> or <sup>B</sup>). We define the operator assign[[<sup>x</sup> <sup>=</sup> <sup>e</sup>]] to compute the effect of an assignment on a map <sup>m</sup>: X <sup>→</sup> usage, where <sup>X</sup> is the set of all variables:

$$\text{ASSIGN}[x=e](m) \stackrel{\text{def}}{=} \lambda y. \begin{cases} W & y=x \land y \notin \text{vars}(e) \land m(x) \in \{U, B\} \\ U & y \in \text{vars}(e) \land m(x) \in \{U, B\} \\ m(y) & \text{otherwise} \end{cases} \tag{19}$$

The assigned variable is overwritten (i.e., maps to W), unless it is used in e.

Another reason for a variable to be used is if it appears in the boolean condition e of a statement that uses another variable or modifies another used variable (i.e., there exists a variable x that maps to U or W):

$$\text{FILTER}[e](m) \stackrel{\text{def}}{=} \lambda y. \begin{cases} U & y \in \text{vARS}(e) \land \exists x \in \text{X} \colon m(x) \in \{U, W\} \\ m(y) & \text{otherwise} \end{cases} \tag{20}$$

We maintain a *stack* of these maps that grows or shrinks based on the level of nesting of the currently analyzed statement. More formally, a stack is a tuple m0, m1,...,mk of mutable length k, where each element m0, m1,...,m<sup>k</sup> is a map from X to usage. In the following, we use Q to denote the set of all stacks, and we abuse notation by writing assign[[<sup>x</sup> <sup>=</sup> <sup>e</sup>]] and filter[[e]] to also denote the corresponding operators on stacks:

$$\text{ASIGN}[x = e](\langle m\_0, m\_1, \dots, m\_k \rangle) \stackrel{\text{def}}{=} \langle \text{ASIGN}[x = e](m\_0), m\_1, \dots, m\_k \rangle$$

$$\text{FILTER}[e](\langle m\_0, m\_1, \dots, m\_k \rangle) \stackrel{\text{def}}{=} \langle \text{FILTER}[e](m\_0), m\_1, \dots, m\_k \rangle$$

The operator push duplicates the map at the top of the stack and modifies the copy using the operator inc, to account for an increased nesting level:

$$\text{PUSH}(\langle m\_0, m\_1, \dots, m\_k \rangle) \stackrel{\text{def}}{=} \langle \text{inc}(m\_0), m\_0, m\_1, \dots, m\_k \rangle$$

$$\text{inc}(m) \stackrel{\text{def}}{=} \lambda y. \begin{cases} B & m(y) \in \{U\} \\ N & m(y) \in \{W\} \\ m(y) & \text{otherwise} \end{cases} \tag{21}$$

A used variable (i.e., mapping to U) becomes used below (i.e., now maps to B), and a modified variable (i.e., mapping to W) becomes unused (i.e., now maps to <sup>N</sup>). The dual operator pop combines the two maps at the top of the stack:

$$\text{POP}(\langle m\_0, m\_1, \dots, m\_k \rangle) \stackrel{\text{def}}{=} \langle \text{DEC}(m\_0, m\_1), \dots, m\_k \rangle$$

$$\text{DEC}(m, k) \stackrel{\text{def}}{=} \lambda y. \begin{cases} k(y) & m(y) \in \{B, N\} \\ m(y) & \text{otherwise} \end{cases} \tag{22}$$

where the dec operator restores the value a variable <sup>y</sup> mapped to before increasing the nesting level (i.e., k(y)) if it has not changed since (i.e., if the variable still maps to B or N), and otherwise retains the new value y maps to.

We can now define the data usage analysis ΛQ, which is a *backward analysis* on the lattice -Q, <sup>Q</sup>,<sup>Q</sup>. The partial order <sup>Q</sup> and the least upper bound <sup>Q</sup> are the pointwise lifting, for each element of the stack, of the partial order and least upper bound between maps from X to usage (which in turn are the pointwise lifting of the partial order usage and least upper bound usage of the usage lattice, cf. Fig. 5). We define the transfer function <sup>Θ</sup>Q[[s]]: Q <sup>→</sup> Q for each statement s in our simple programming language as follows:

```
math, bonus → U, passing → W Q passing → U = math, bonus, passing → U
if not math:
    bonus → U, passing → W | passing → U
    passing = bonus
    passing → B | passing → U
passing → U
```
**Fig. 6.** Data usage analysis of the last statement of the program shown in Example 7. Stack elements are separated by | and, for brevity, variables mapping to N are omitted.

<sup>Θ</sup>Q[[skip]](q) def = q ΘQ[[x = e]](q) def <sup>=</sup> assign[[<sup>x</sup> <sup>=</sup> <sup>e</sup>]](q) <sup>Θ</sup>Q[[if <sup>b</sup> : <sup>s</sup><sup>1</sup> else: <sup>s</sup>2]](q) def <sup>=</sup> pop ◦ filter[[b]] ◦ <sup>Θ</sup>Q[[s1]] ◦ push(q) <sup>Q</sup> pop ◦ filter[[b]] ◦ <sup>Θ</sup>Q[[s2]] ◦ push(q) <sup>Θ</sup>Q[[while <sup>b</sup> : <sup>s</sup>]](q) def = lfp<sup>Q</sup> <sup>t</sup> <sup>Θ</sup>Q[[if <sup>b</sup> : <sup>s</sup> else: skip]] ΘQ[[s<sup>1</sup> s2]](q) def = ΘQ[[s1]] ◦ ΘQ[[s2]](q)

The initial stack contains a single map, in which the output variables map to the value U, and all other variables map to N. We exemplify the analysis below.

*Example 10.* Let us consider again the program P shown in Example 7. The initial stack begins with a single map <sup>m</sup>, in which the output variable passing maps to U and all other variables map to N.

At line 4, before analyzing the body of the conditional statement, a modified copy of <sup>m</sup> is pushed onto the stack: this copy maps passing to <sup>B</sup>, meaning that passing is only used in a lower nesting level, and all other variables still map to <sup>N</sup> (cf. Eq. 21). As a result of the assignment (cf. Eq. 19), passing is overwritten (i.e., maps to W), and bonus is used (i.e., maps to U). Since the body of the conditional statement modifies a used variable and uses another variable, the analysis of its boolean condition makes math used as well (cf. Eq. 20). Finally, the maps at the top of the stack are merged and the result maps math, bonus, and passing to <sup>U</sup>, and all other variables to <sup>N</sup> (cf. Eq. 22). The analysis is visualized in Fig. 6.

The stack remains unchanged at line 3 and line 2, since the statement at line 3 is identical to line 4 and the body of the conditional statement at line 2 does not modify any used variable and does not use any other variable. Finally, at line 1 the variable passing is modified (i.e., it now maps to <sup>W</sup>), while math and bonus remain used (i.e., they map to <sup>U</sup>). Thus, the analysis is precise enough to conclude that the input variables english and science are unused. -

Note that, similarly to the non-interference analysis presented in Sect. 8, the data usage analysis Λ<sup>Q</sup> does not consider non-termination. Indeed, for the program shown in Example 8, the analysis does not capture that the input variable english is used, even though the termination of the program depends on its value. We define the concretization function γ<sup>Q</sup> : Q → P (P (Σ × Σ)) as:

$$\gamma\_{\mathbb{Q}}(\langle m\_0, \dots, m\_k \rangle) \stackrel{\text{def}}{=} \{ R \in \Sigma \times \Sigma \mid \forall i \in \mathcal{X} \colon m\_0(i) \in \{N\} \Rightarrow \mathsf{vn} \text{usemp}\_i(R) \}\tag{23}$$

where again we write unused<sup>i</sup> (cf. Eq. 3) to also denote its dependency abstraction. We now show that Λ<sup>Q</sup> is sound for proving that a program does not use a subset of its input variables, *if the program is terminating*.

**Theorem 7.** *A terminating program does not use a subset* J *of its input variables if the image via* γ-◦ γ<sup>Q</sup> *of its abstraction* Λ<sup>Q</sup> *is a subset of* N<sup>J</sup> *:*

$$\gamma\_{\prec}(\gamma\_{\mathcal{Q}}(A\_{\mathcal{Q}})) \subseteq \mathcal{N}\_J \Rightarrow P \mid = \mathcal{N}\_J$$

*Proof.* Let us assume that γ-(γQ(ΛQ)) ⊆ N<sup>J</sup> . Since the program is terminating, we have that Λ- ⊆ γQ(ΛQ), by definition of the concretization function γ<sup>Q</sup> (cf. Eq. 23). Then, by monotonicity of γ- (cf. Eq. 11), we have that γ-(Λ-) ⊆ γ-(γQ(ΛQ)). Thus, since γ-(γQ(ΛQ)) ⊆ N<sup>J</sup> , we have that γ-(Λ-) ⊆ N<sup>J</sup> . The conclusion follows from Theorem 4. 

In order to take termination into account, one could map each variable appearing in the guard of a loop to the value U. Alternatively, one could run a termination analysis [3,33,34], along with the data usage analysis, and only map to U variables appearing in the loop guard of a possibly non-terminating loop.

### **11 Piecewise Abstractions**

The static analyses presented so far can be used only to detect unused data stored in program variables. However, realistic data science applications read and manipulate data organized in data structures such as arrays, lists, and dictionaries. In the following, we demonstrate that having expressed the analyses as abstract domains allows us to easily lift the analyses to such a scenario. In particular, to detect unused chunks of the input data, we combine the more precise data usage analysis presented in the previous section with the array content abstraction proposed by Cousot et al. [16]. Due to space limitations, we provide only an informal description of the resulting abstract domain and refer to [36] for further details and examples. The analyses presented in earlier sections can be similarly combined with the array abstraction for the same purpose.

We extend our small programming language introduced in Sect. 7 with integer variables, arithmetic and boolean comparison expressions, and arrays:

$$\begin{aligned} e & \coloneqq \cdots \cdot |a[e] \mid \mathbf{len}(a) \mid e \oplus \ e \mid e \gg e \\ s & \coloneqq \cdots \cdot |a[e] = e \end{aligned} \tag{\text{expressions}}$$

where ⊕ and respectively range over arithmetic and boolean comparison operators, <sup>a</sup> ranges over array variables, and len(a) denotes the length of <sup>a</sup>.

*Piecewise Array Abstraction.* The array abstraction [16] divides an array into consecutive segments, each segment being a uniform abstraction of the array content in that segment. The bounds of the segments are specified by sets of side-effect free expressions restricted to a canonical normal form, all having the same (concrete) value. The abstraction is parametric in the choice of the abstract domains used to manipulate sets of expressions and to represent the array content within each segment. For our analysis, we use the octagon abstract domain [31] for the expressions, and the usage lattice presented in the previous section (cf. Fig. 5) for the segments. Thus, an array a is abstracted, for instance, as {0, i} <sup>N</sup> {<sup>j</sup> + 1}? <sup>U</sup> {len(a)}, where the symbol ? indicates that the segment {0, i} N {j + 1} might be empty. The abstraction indicates that all array elements (if any) from index i (which is equal to zero) to index j (the bound j + 1 is exclusive) are unused, and all elements from <sup>j</sup> + 1 to len(a) <sup>−</sup> 1 may be used. Let A be the set of all such array abstractions. The initial segmentation of an array <sup>a</sup> <sup>∈</sup> A is a single segment with unused content (i.e., {0} <sup>N</sup> {len(a)}?).

For our analysis, we augment the array abstraction with new backward assignment and filter operators. The operators assign<sup>A</sup>[[a[i] = <sup>e</sup>]] and filter<sup>A</sup>[[e]] split and fill segments to take into account assignments and accesses to array elements that influence the program outcome. For instance, an assignment to a[i] with an expression containing a used variable modifies the segmentation {0} <sup>N</sup> {len(a)}? into {0} <sup>N</sup> {i}? <sup>U</sup> {<sup>i</sup> + 1} <sup>N</sup> {len(a)}?, which indicates that the array element at index i is used by the program. An access a[i] in a boolean condition guarding a statement that uses or modifies another used variables is handled analogously. Instead, the operator assign<sup>A</sup>[[<sup>x</sup> <sup>=</sup> <sup>e</sup>]] modifies the segmentation of an array by replacing each occurrence of the assigned variable x with the canonical normal form of the expression e. For instance, an assignment <sup>i</sup> <sup>=</sup> <sup>i</sup> + 1 modifies the segmentation {0} <sup>N</sup> {i}? <sup>U</sup> {<sup>i</sup> + 1} <sup>N</sup> {len(a)}? into {0} <sup>N</sup> {<sup>i</sup> + 1}? <sup>U</sup> {<sup>i</sup> + 2} <sup>N</sup> {len(a)}?. If <sup>e</sup> cannot be precisely put into a canonical normal form, the operator replaces the assigned variable with an approximation of e as an integer interval [13] computed using the underlying numerical domain, and possibly merges segments together as a result of the approximation. For instance, a non-linear assignment i = i∗j approximated as i = [0, 1] modifies the segmentation {0} <sup>N</sup> {i}? <sup>U</sup> {<sup>i</sup> + 1} <sup>N</sup> {len(a)}? into {0}<sup>U</sup> {2} <sup>N</sup> {len(a)}?, which loses the information that the initial segment of the array is unused.

When merging control flows, segmentations are compared or joined by means of a *unification algorithm* [16], which finds the coarsest common refinement of both segmentations. Then, the comparison <sup>A</sup> or the join <sup>A</sup> is performed pointwise for each segment using the corresponding operators of the underlying abstract domain chosen to abstract the array content. For our analysis, we adapt and refine the originally proposed unification algorithm to take into account the knowledge of the numerical domain chosen to abstract the segment bounds. We refer to [36] for further details. A widening <sup>A</sup> limits the number of segments to enforce termination of the analysis.

*Piecewise Data Usage Analysis.* We can now map each scalar variable to an element of the usage lattice and each array variable to an array segmentation

```
1 failed = 0
2 i=1 # 1 should be 0
3 while i < len ( grades ) :
4 i f grades [ i ] < 4: failed = failed + 1
5 i= i+1
6 passing = 2 ∗ failed < len ( grades )
```
**Fig. 7.** Another program to check if a student has passed a number of exams based on their grades stored in the array grades. The programmer has made a mistake at line 2 that causes the program to ignore the grade stored at index 0 in grades.

```
grades → {0} N {i}? U {i + 1}? U {len(grades)}?
while i < len(grades):
    grades → {0} N {i}? U {i + 1}? B {i + 2}? B {len(grades)}? | ...
    if grades[i] < 4:
        grades → {0} N {i + 1}? B {i + 2}? B {len(grades)}? |···| ...
        failed = failed + 1
        grades → {0} N {i + 1}? B {i + 2}? B {len(grades)}? |···| ...
    grades → {0} N {i + 1}? B {i + 2}? B {len(grades)}? | ...
    i = i + 1
    grades → {0} N {i}? B {i + 1}? B {len(grades)}? | ...
grades → {0} N {len(grades)}?
```
**Fig. 8.** Data usage analysis of the loop statement of the program shown in Example 11. Stack elements are separated by | and, for brevity, only array variables are shown.

in A, and use the data usage analysis Λ<sup>Q</sup> presented in the previous section to identify unused input data stored in variables and portions of arrays.

*Example 11.* Let us consider the program shown in Fig. 7 where the array variable grades and the variable passing are the input and output variables, respectively. The initial stack contains a single map in which passing maps to <sup>U</sup>, all other scalar variables map to <sup>N</sup>, and grades maps to {0} <sup>N</sup> {len(grades)}?, indicating that all elements of the array (if any) are unused.

At line 6, the assignment modifies the variable passing (i.e., passing now maps to <sup>W</sup>) and uses the variable failed (i.e., failed now maps to <sup>U</sup>), while every other variable remains unchanged.

The result of the analysis of the loop statement at line 3 is shown in Fig. 8. The analysis of the loop begins by pushing (cf. Eq. 21) a map onto the stack in which passing becomes unused (i.e., maps to <sup>N</sup>) and failed is used only in a lower nesting level (i.e., maps to <sup>B</sup>), and every other variable still remains unchanged. At the first iteration of the analysis of the loop body, the assignment at line 4 uses failed and thus the access grades[i] at line 3 creates a used segment in the segmentation for grades, which becomes {0} <sup>N</sup> {i}? <sup>U</sup> {i + 1} <sup>N</sup> {len(grades)}?. At the second iteration, the push operator turns the used segment {i}<sup>U</sup> {<sup>i</sup> + 1} into {i} <sup>B</sup> {i + 1}, and the assignment to i modifies the segment into {i + 1} <sup>B</sup> {i + 2} (while the segmentation in the second stack element becomes {0} <sup>N</sup> {i + 1}? <sup>U</sup> {i + 2} <sup>N</sup> {len(grades)}?). Then, the access to the array at line 3 creates again a used segment {i}<sup>U</sup> {i + 1} (in the first segmentation) and the analysis continues with the result of the pop operator (cf. Eq. 22): {0} <sup>N</sup> {i}? <sup>U</sup> {i + 1}? <sup>U</sup> {i + 2}? <sup>N</sup> {len(grades)}?. After widening, the last two segments are merged into a single segment, and the analysis of the loop terminates with {0} <sup>N</sup> {i}? <sup>U</sup> {i + 1}? <sup>U</sup> {len(grades)}?.

Finally, the analysis of the assignment at line 2 produces the segmentation {0} <sup>N</sup> {1}? <sup>U</sup> {2}? <sup>U</sup> {len(grades)}?, which correctly indicates that the first element of the array grades (if any) is unused by the program. -

*Implementation.* The analyses presented in this and in the previous section are implemented in the prototype static analyzer lyra and are available online<sup>3</sup>.

The implementation is in python and, at the time of writing, accepts programs written in a limited subset of python without user-defined classes. A type inference is run before the analysis of a program. The analysis is performed backwards on the control flow graph of the program with a standard worklist algorithm [32], using widening at loop heads to enforce termination.

### **12 Related Work**

The most directly relevant work has been discussed throughout the paper. The non-interference analysis proposed by Assaf et al. [6] (cf. Sect. 8) is similar to the logic of Amtoft and Banerjee [5] and the type system of Hunt and Sands [25]. The data usage analysis proposed in Sect. 10 is similar to dependency analyses used for program slicing [37] (e.g., [24]). Both analyses as well as strongly live variable analysis (cf. Sect. 9) are based on the *syntactic* presence of a variable in the definition of another variable. To overcome this limitation, one should look further for *semantic* dependencies between *values* of program variables. In this direction, Giacobazzi, Mastroeni, and others [19,22,29] have proposed the notion of *abstract dependency*. However, note that an analysis based on abstract dependencies would over-approximate the subset of the input variables that are unused by a program. Indeed, the absence of an abstract dependency between variables (e.g., a dependency between the parity of the variables [19,29]) does not imply the absence of a (concrete) dependency between the variables (i.e., a dependency between the values of the variables). Thus, such an analysis could not be used to prove that a program *does not use* a subset of its input variables, but would be used to prove that a program *uses* a subset of its input variables.

Semantics formulations using *sets of sets of traces* have already been proposed in the literature [6,28]. Mastroeni and Pasqua [28] lift the hierarchy of semantics developed by Cousot [12] to sets of sets of traces to obtain a hierarchy of semantics suitable for verifying general program properties (i.e., properties that are not subset-closed, cf. Sect. 7). However, *none* of the semantics that they proposed is suitable for input data usage: all semantics in the hierarchy are abstractions of a semantics that contains sets with both finite and infinite traces

<sup>3</sup> http://www.pm.inf.ethz.ch/research/lyra.html.

and thus, unlike our outcome semantics (cf. Sect. 5), cannot be used to reason about terminating and non-terminating outcomes of a program. Similarly, as observed in [28], the semantics proposed by Assaf et al. [6] can be used to verify only subset-closed properties. Thus, it cannot be used for input data usage.

Finally, to the best of our knowledge, our work is the first to aim at detecting programming errors in data science code using static analysis. Closely related are [7,10] which, however, focus on spreadsheet applications and target errors in the data rather than the code that analyzes it. Recent work [2] proposes an approach to repair *bias* in data science code. We believe that our work can be applied in this context to prove absence of bias, e.g., by showing that a program does not use gender information to decide whether to hire a person.

### **13 Conclusion and Future Work**

In this paper, we have proposed an abstract interpretation framework to automatically detect input data that remains unused by a program. Additionally, we have shown that existing static analyses based on dependencies are subsumed by our unifying framework and can be used, with varying degrees of precision, for proving that a program does not use some of its input data. Finally, we have proposed a data usage analysis for more realistic data science applications that store input data in compound data structures such as arrays or lists.

As part of our future work, we plan to use our framework to guide the design of new, more precise static analyses for data usage. We also want to explore the complementary direction of proving that a program *uses* its input data by developing an analysis based on abstract dependencies [19,22,29] between program variables, as discussed above. Additionally, we plan to investigate other applications of our work such as provenance or lineage analysis [9] as well as proving absence of algorithmic bias [2]. Finally, we want to study other programming errors related to data usage such as accidental data duplication.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Higher-Order Program Verification via HFL Model Checking**

Naoki Kobayashi(B) , Takeshi Tsukada, and Keiichi Watanabe

> The University of Tokyo, Tokyo, Japan koba@is.s.u-tokyo.ac.jp

**Abstract.** There are two kinds of higher-order extensions of model checking: HORS model checking and HFL model checking. Whilst the former has been applied to automated verification of higher-order functional programs, applications of the latter have not been well studied. In the present paper, we show that various verification problems for functional programs, including may/must-reachability, trace properties, and linear-time temporal properties (and their negations), can be naturally reduced to (extended) HFL model checking. The reductions yield a sound and complete logical characterization of those program properties. Compared with the previous approaches based on HORS model checking, our approach provides a more uniform, streamlined method for higher-order program verification.

### **1 Introduction**

There are two kinds of higher-order extensions of model checking in the literature: HORS model checking [16,32] and HFL model checking [42]. The former is concerned about whether the tree generated by a given higher-order tree grammar called a higher-order recursion scheme (HORS) satisfies the property expressed by a given modal μ-calculus formula (or a tree automaton), and the latter is concerned about whether a given finite state system satisfies the property expressed by a given formula of higher-order modal fixpoint logic (HFL), a higher-order extension of the modal μ-calculus. Whilst HORS model checking has been applied to automated verification of higher-order functional programs [17,18,22,26,33,41,43], there have been few studies on applications of HFL model checking to program/system verification. Despite that HFL has been introduced more than 10 years ago, we are only aware of applications to assumeguarantee reasoning [42] and process equivalence checking [28].

In the present paper, we show that various verification problems for higherorder functional programs can actually be reduced to (extended) HFL model checking in a rather natural manner. We briefly explain the idea of our reduction below.<sup>1</sup> We translate a program to an HFL formula that says "the program has a valid behavior" (where the *validity* of a behavior depends on each verification

c The Author(s) 2018

<sup>1</sup> In this section, we use only a fragment of HFL that can be expressed in the modal μ-calculus. Some familiarity with the modal μ-calculus [25] would help.

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 711–738, 2018. https://doi.org/10.1007/978-3-319-89884-1\_25

problem). Thus, a program is actually mapped to a *property*, and a program property is mapped to a system to be verified; this has been partially inspired by the recent work of Kobayashi et al. [19], where HORS model checking problems have been translated to HFL model checking problems by switching the roles of models and properties.

For example, consider a simple program fragment read(x); close(x) that reads and then closes a file (pointer) x. The transition system in Fig. <sup>1</sup> shows a valid access protocol to read-only files. Then, the property that a read operation is allowed in the current state can be expressed by a formula of the form readϕ, which says that the current state has a read-transition, after which ϕ is satisfied. Thus, the program read(x); close(x) being valid is expressed as readclose**true**, <sup>2</sup> which is indeed satisfied by the initial state <sup>q</sup><sup>0</sup> of the transition system in Fig. 1. Here, we have just replaced the operations read and close of the program with the corresponding modal operators read and close. We can also naturally deal with branches and recursions. For example, consider the program close(x)-(read(x); close(x)), where e<sup>1</sup><sup>e</sup><sup>2</sup> represents a non-deterministic choice between <sup>e</sup><sup>1</sup> and <sup>e</sup><sup>2</sup>. Then the property that the program always accesses x in a valid manner can be expressed by (close**true**) <sup>∧</sup> (readclose**true**). Note that we have just replaced the non-deterministic branch with the logical conjunction, as we wish here to require that the program's behavior is valid in *both* branches. We can also deal with conditional branches if HFL is extended with predicates; **if** b **then** close(x) **else** (read(x); close(x)) can be translated to (b ⇒ close**true**) <sup>∧</sup> (¬b ⇒ readclose**true**). Let us also consider the recursive function f defined by:

$$f\,x = \mathtt{c1oso}(x)\Box(\mathtt{read}(x);\mathtt{read}(x);f x),$$

Then, the program f x being valid can be represented by using a (greatest) fixpoint formula:

$$\nu F.(\langle \mathtt{c1ose} \rangle \mathtt{true}) \land (\langle \mathtt{read} \rangle \langle \mathtt{read} \rangle F).$$

If the state <sup>q</sup><sup>0</sup> satisfies this formula (which is indeed the case), then we know that all the file accesses made by f x are valid. So far, we have used only the modal μ-calculus formulas. If we wish to express the validity of higher-order programs, we need HFL formulas; such examples are given later.

**Fig. 1.** File access protocol

<sup>2</sup> Here, for the sake of simplicity, we assume that we are interested in the usage of the single file pointer x, so that the name x can be ignored in HFL formulas; usage of multiple files can be tracked by using the technique of [17].

We generalize the above idea and formalize reductions from various classes of verification problems for simply-typed higher-order functional programs with recursion, integers and non-determinism – including verification of may/mustreachability, trace properties, and linear-time temporal properties (and their negations) – to (extended) HFL model checking where HFL is extended with integer predicates, and prove soundness and completeness of the reductions. Extended HFL model checking problems obtained by the reductions are (necessarily) undecidable in general, but for finite-data programs (i.e., programs that consist of only functions and data from finite data domains such as Booleans), the reductions yield *pure* HFL model checking problems, which are decidable [42].

Our reductions provide sound and complete logical characterizations of a wide range of program properties mentioned above. Nice properties of the logical characterizations include: (i) (like verification conditions for Hoare triples,) once the logical characterization is obtained as an HFL formula, purely logical reasoning can be used to prove or disprove it (without further referring to the program semantics); for that purpose, one may use theorem provers with various degrees of automation, ranging from interactive ones like Coq, semi-automated ones requiring some annotations, to fully automated ones (though the latter two are yet to be implemented), (ii) (unlike the standard verification condition generation for Hoare triples using invariant annotations) the logical characterization can *automatically* be computed, without any annotations,<sup>3</sup> (iii) standard logical reasoning can be applied based on the semantics of formulas; for example, coinduction and induction can be used for proving ν- and μ-formulas respectively, and (iv) thanks to the completeness, the set of program properties characterizable by HFL formula is closed under negations; for example, from a formula characterizing may-reachability, one can obtain a formula characterizing nonreachability by just taking the De Morgan dual.

Compared with previous approaches based on HORS model checking [18, 22,26,33,37], our approach based on (extended) HFL model checking provides more uniform, streamlined methods for higher-order program verification. HORS model checking provides sound and complete verification methods for *finite-data* programs [17,18], but for infinite-data programs, other techniques such as predicate abstraction [22] and program transformation [27,31] had to be combined to obtain sound (but incomplete) reductions to HORS model checking. Furthermore, the techniques were different for each of program properties, such as reachability [22], termination [27], non-termination [26], fair termination [31], and fair non-termination [43]. In contrast, our reductions are sound and complete even for infinite-data programs. Although the obtained HFL model checking problems are undecidable in general, the reductions allow us to treat various program properties uniformly; all the verifications are boiled down to the issue of how to prove μ- and ν-formulas (and as remarked above, we can use induction and co-induction to deal with them). Technically, our reduction to HFL model

<sup>3</sup> This does not mean that invariant discovery is unnecessary; invariant discovery is just postponed to the later phase of discharging verification conditions, so that it can be uniformly performed among various verification problems.

checking may actually be considered an extension of HORS model checking in the following sense. HORS model checking algorithms [21,32] usually consist of two phases, one for computing a kind of higher-order "procedure summaries" in the form of variable profiles [32] or intersection types [21], and the other for nested least/greatest fixpoint computations. Our reduction from program verification to extended HFL model checking (the reduction given in Sect. 7, in particular) can be regarded as an extension of the first phase to deal with infinite data domains, where the problem for the second phase is expressed in the form of extended HFL model checking: see [23] for more details.

The rest of this paper is structured as follows. Section 2 introduces HFL extended with integer predicates and defines the HFL model checking problem. Section 3 informally demonstrates some examples of reductions from program verification problems to HFL model checking. Section 4 introduces a functional language used to formally discuss the reductions in later sections. Sections 5, 6, and 7 consider may/must-reachability, trace properties, and temporal properties respectively, and present (sound and complete) reductions from verification of those properties to HFL model checking. Section 8 discusses related work, and Sect. 9 concludes the paper. Proofs are found in an extended version [23].

### **2 (Extended) HFL**

In this section, we introduce an extension of higher-order modal fixpoint logic (HFL) [42] with integer predicates (which we call HFL**Z**; we often drop the subscript and write HFL, as in Sect. 1), and define the HFL**<sup>Z</sup>** model checking problem. The set of integers can actually be replaced by another infinite set X of data (like the set of natural numbers or the set of finite trees) to yield HFLX.

### **2.1 Syntax**

For a map f, we write *dom*(f) and *codom*(f) for the domain and codomain of f respectively. We write **<sup>Z</sup>** for the set of integers, ranged over by the metavariable n below. We assume a set **Pred** of primitive predicates on integers, ranged over by p. We write arity(p) for the arity of p. We assume that **Pred** contains standard integer predicates such as = and <, and also assume that, for each predicate p <sup>∈</sup> **Pred**, there also exists a predicate <sup>¬</sup>p <sup>∈</sup> **Pred** such that, for any integers n<sup>1</sup>,...,n<sup>k</sup>, <sup>p</sup>(n<sup>1</sup>,...,n<sup>k</sup>) holds if and only if <sup>¬</sup>p(n<sup>1</sup>,...,n<sup>k</sup>) does not hold; thus, <sup>¬</sup>p(n<sup>1</sup>,...,n<sup>k</sup>) should be parsed as (¬p)(n<sup>1</sup>,...,n<sup>k</sup>), but can semantically be interpreted as <sup>¬</sup>(p(n<sup>1</sup>,...,n<sup>k</sup>)).

The syntax of HFL**<sup>Z</sup>** *formulas* is given by:

$$\begin{array}{c} \varphi \text{ (fermulas)} ::= n \mid \varphi\_1 \text{ op } \varphi\_2 \mid \mathsf{true} \mid \mathsf{false} \mid p(\varphi\_1, \dots, \varphi\_k) \mid \varphi\_1 \lor \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_1 \mid \varphi\_2 \mid \varphi\_$$

Here, op ranges over a set of binary operations on integers, such as +, and X ranges over a denumerable set of variables. We have extended the original HFL [42] with integer expressions (<sup>n</sup> and <sup>ϕ</sup><sup>1</sup> op <sup>ϕ</sup><sup>2</sup>), and atomic formulas p(ϕ<sup>1</sup>,...,ϕk) on integers (here, the arguments of integer operations or predicates will be restricted to integer expressions by the type system introduced below). Following [19], we have omitted negations, as any formula can be transformed to an equivalent negation-free formula [30].

We explain the meaning of each formula informally; the formal semantics is given in Sect. 2.2. Like modal μ-calculus [10,25], each formula expresses a property of a labeled transition system. The first line of the syntax of formulas consists of the standard constructs of predicate logics. On the second line, as in the standard modal μ-calculus, aϕ means that there exists an a-labeled transition to a state that satisfies ϕ. The formula [a]ϕ means that after any alabeled transition, ϕ is satisfied. The formulas μX<sup>τ</sup> .ϕ and νX<sup>τ</sup> .ϕ represent the least and greatest fixpoints respectively (the least and greatest X that X <sup>=</sup> ϕ) respectively; unlike the modal μ-calculus, X may range over not only propositional variables but also higher-order predicate variables (of type τ ). The λabstractions λX :σ.ϕ and applications <sup>ϕ</sup><sup>1</sup> <sup>ϕ</sup><sup>2</sup> are used to manipulate higher-order predicates. We often omit type annotations in μX<sup>τ</sup> .ϕ, νX<sup>τ</sup> .ϕ and λX : σ.ϕ, and just write μX.ϕ, νX.ϕ and λX.ϕ.

*Example 1.* Consider <sup>ϕ</sup>ab <sup>ϕ</sup> where <sup>ϕ</sup>ab <sup>=</sup> μX•→•.λY : •.Y ∨ a(X(bY )). We can expand the formula as follows:

$$\begin{split} \varphi\_{\mathbf{a}\mathbf{b}}\varphi &= (\lambda Y.\bullet.Y \lor \langle \mathbf{a} \rangle (\varphi\_{\mathbf{a}\mathbf{b}}(\langle \mathbf{b} \rangle Y)))\varphi = \varphi \lor \langle \mathbf{a} \rangle (\varphi\_{\mathbf{a}\mathbf{b}}(\langle \mathbf{b} \rangle \varphi))) \\ &= \varphi \lor \langle \mathbf{a} \rangle (\langle \mathbf{b} \rangle \varphi \lor \langle \mathbf{a} \rangle (\varphi\_{\mathbf{a}\mathbf{b}}(\langle \mathbf{b} \rangle \langle \mathbf{b} \rangle \varphi))) = \cdots \end{split}$$

and obtain ϕ <sup>∨</sup> (abϕ) <sup>∨</sup> (aabbϕ) ∨···. Thus, the formula means that there is a transition sequence of the form <sup>a</sup><sup>n</sup>b<sup>n</sup> for some n <sup>≥</sup> 0 that leads to a state satisfying ϕ.

Following [19], we exclude out unmeaningful formulas such as (a**true**)+1 by using a simple type system. The types •, int, and σ <sup>→</sup> τ describe propositions, integers, and (monotonic) functions from σ to τ , respectively. Note that the integer type int may occur only in an argument position; this restriction is required to ensure that least and greatest fixpoints are well-defined. The typing rules for formulas are given in Fig. 2. In the figure, Δ denotes a type environment, which is a finite map from variables to (extended) types. Below we consider only well-typed formulas.

### **2.2 Semantics and HFL<sup>Z</sup> Model Checking**

We now define the formal semantics of HFL**<sup>Z</sup>** formulas. A *labeled transition system* (LTS) is a quadruple <sup>L</sup> = (U, A, −→, <sup>s</sup>init), where <sup>U</sup> is a finite set of states, A is a finite set of actions, −→ ⊆ U <sup>×</sup>A×U is a labeled transition relation, and <sup>s</sup>init <sup>∈</sup> <sup>U</sup> is the initial state. We write <sup>s</sup><sup>1</sup> <sup>a</sup>−→ <sup>s</sup><sup>2</sup> when (s<sup>1</sup>, a, <sup>s</sup><sup>2</sup>) ∈ −→.

For an LTS <sup>L</sup> = (U, A, −→, <sup>s</sup>init) and an extended type <sup>σ</sup>, we define the partially ordered set (DL,σ, L,σ) inductively by:

$$\begin{array}{lll} \mathcal{D}\_{\mathsf{L},\bullet} = 2^{U} & \sqsubseteq\_{\mathsf{L},\bullet} \sqcap\_{\bullet} = \subseteq & \mathcal{D}\_{\mathsf{L},\mathsf{int}} = \mathbf{Z} & \sqsubseteq\_{\mathsf{L},\mathsf{int}} = \{(n,n) \mid n \in \mathbf{Z}\} \\ \mathcal{D}\_{\mathsf{L},\sigma \to \tau} = \{f \in \mathcal{D}\_{\mathsf{L},\sigma} \to \mathcal{D}\_{\mathsf{L},\tau} \mid \forall x, y. (x \sqsubseteq\_{\mathsf{L},\sigma} y \Rightarrow f \ x \sqsubseteq\_{\mathsf{L},\tau} f \, y)\} \\ \sqsubseteq\_{\mathsf{L},\sigma \to \tau} = \{(f,g) \mid \forall x \in \mathcal{D}\_{\mathsf{L},\sigma}.f(x) \sqsubseteq\_{\mathsf{L},\tau} g(x)\} \end{array}$$


**Fig. 2.** Typing rules for HFL**<sup>Z</sup>** formulas

Note that (DL,τ , L,τ ) forms a complete lattice (but (DL,int, L,int) does not). We write <sup>⊥</sup>L,τ and L,τ for the least and greatest elements of <sup>D</sup>L,τ (which are λx. - ∅ and λx.U- ) respectively. We sometimes omit the subscript L below. Let -ΔL be the set of functions (called *valuations*) that maps <sup>X</sup> to an element of <sup>D</sup>L,σ for each <sup>X</sup> : <sup>σ</sup> <sup>∈</sup> <sup>Δ</sup>. For an HFL formula <sup>ϕ</sup> such that <sup>Δ</sup> H <sup>ϕ</sup> : <sup>σ</sup>, we define -<sup>Δ</sup> H <sup>ϕ</sup> : <sup>σ</sup>L as a map from -<sup>Δ</sup>L to <sup>D</sup>σ, by induction on the derivation<sup>4</sup> of <sup>Δ</sup> H <sup>ϕ</sup> : <sup>σ</sup>, as follows. 


<sup>4</sup> Note that the derivation of each judgment Δ -H <sup>ϕ</sup> : <sup>σ</sup> is unique if there is any.

Here, op denotes the binary function on integers represented by op and p denotes the k-ary relation on integers represented by p. The least/greatest fixpoint operators **lfp**L,τ and **gfp**L,τ are defined by **lfp**L,τ (f) = L,τ {<sup>x</sup> ∈ DL,τ <sup>|</sup> <sup>f</sup>(x) <sup>L</sup>,τ <sup>x</sup>} and **gfp**L,τ (f) = L,τ {<sup>x</sup> ∈ DL,τ <sup>|</sup> <sup>x</sup> <sup>L</sup>,τ <sup>f</sup>(x)}. Here, L,τ and L,τ respectively denote the least upper bound and the greatest lower bound with respect to L,τ . We often omit the subscript <sup>L</sup> and write -<sup>Δ</sup> H <sup>ϕ</sup> : <sup>σ</sup> for -<sup>Δ</sup> H <sup>ϕ</sup> : <sup>σ</sup>L. For a closed formula, i.e., a formula well-typed under the empty type environment ∅, we often write <sup>ϕ</sup>L or just ϕ for -∅ H <sup>ϕ</sup> : <sup>σ</sup>L(∅).

*Example 2.* For the LTS <sup>L</sup>*file* in Fig. 1, we have:

$$\begin{aligned} & \left[ \nu X^{\bullet}. (\langle \mathtt{close} \rangle \mathtt{true} \land \langle \mathtt{read} \rangle X) \right] = \\ & \mathbf{gfp}\_{\mathtt{L},\bullet} (\lambda x \in \mathcal{D}\_{\mathtt{L},\bullet} \left[ X : \bullet \vdash \langle \mathtt{close} \rangle \mathtt{true} \land \langle \mathtt{read} \rangle X : \bullet \right] (\{ X \mapsto x \})) = \{ q\_{0} \}. \end{aligned}$$

In fact, <sup>x</sup> <sup>=</sup> {q<sup>0</sup>}∈DL,• satisfies the equation: -X : •close**true** ∧ readX : •L({<sup>X</sup> → <sup>x</sup>}) = <sup>x</sup>, and <sup>x</sup> <sup>=</sup> {q<sup>0</sup>}∈DL,• is the greatest such element.

Consider the following LTS L<sup>1</sup>:

and <sup>ϕ</sup>ab (<sup>c</sup>**true**) where <sup>ϕ</sup>ab is the one introduced in Example 1. Then, <sup>ϕ</sup>ab (<sup>c</sup>**true**)L<sup>1</sup> <sup>=</sup> {q0, q<sup>2</sup>}.

**Definition 1 (HFL<sup>Z</sup> model checking).** *For a closed formula* <sup>ϕ</sup> *of type* •*, we write* L, s <sup>|</sup><sup>=</sup> ϕ *if* s <sup>∈</sup> <sup>ϕ</sup>L*, and write* <sup>L</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *if* <sup>s</sup>init <sup>∈</sup> <sup>ϕ</sup>L*.* HFL**<sup>Z</sup>** model checking *is the problem of, given* L *and* ϕ*, deciding whether* L <sup>|</sup><sup>=</sup> ϕ *holds.*

The HFL**<sup>Z</sup>** model checking problem is *un*decidable, due to the presence of integers; in fact, the semantic domain <sup>D</sup>L,σ is not finite for <sup>σ</sup> that contains int. The undecidability is obtained as a corollary of the soundness and completeness of the reduction from the may-reachability problem to HFL model checking discussed in Sect. 5. For the fragment of pure HFL (i.e., HFL**<sup>Z</sup>** without integers, which we write HFL<sup>∅</sup> below), the model checking problem is decidable [42].

The *order* of an HFL**<sup>Z</sup>** model checking problem <sup>L</sup> ? <sup>|</sup><sup>=</sup> ϕ is the highest order of types of subformulas of ϕ, where the order of a type is defined by: order(•) = order(int) = 0 and order(σ <sup>→</sup> τ ) = max(order(σ)+1, order(τ )). The complexity of order-<sup>k</sup> HFL<sup>∅</sup> model checking is <sup>k</sup>-EXPTIME complete [1], but polynomial time in the size of HFL formulas under the assumption that the other parameters (the size of LTS and the largest size of types used in formulas) are fixed [19].

*Remark 1.* Though we do not have quantifiers on integers as primitives, we can encode them using fixpoint operators. Given a formula ϕ : int → •, we can express <sup>∃</sup>x : int.ϕ(x) and <sup>∀</sup>x : int.ϕ(x) by (μXint→•.λx : int.ϕ(x) <sup>∨</sup> <sup>X</sup>(<sup>x</sup> <sup>−</sup> 1) <sup>∨</sup> X(x + 1))0 and (νXint→•.λx : int.ϕ(x) <sup>∧</sup> X(x <sup>−</sup> 1) <sup>∧</sup> X(x + 1))0 respectively.

### **2.3 HES**

As in [19], we often write an HFL**<sup>Z</sup>** formula as a sequence of fixpoint equations, called a *hierarchical equation system* (HES).

**Definition 2.** *An (extended)* hierarchical equation system *(HES) is a pair* (E, ϕ) *where* <sup>E</sup> *is a sequence of fixpoint equations, of the form:* Xτ<sup>1</sup> <sup>1</sup> =α<sup>1</sup> ϕ<sup>1</sup>; ··· ; Xτ<sup>n</sup> <sup>n</sup> <sup>=</sup>α<sup>n</sup> <sup>ϕ</sup>n*, where* <sup>α</sup><sup>i</sup> ∈ {μ, ν}*. We assume that* <sup>X</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup>,...,X<sup>n</sup> : <sup>τ</sup><sup>n</sup> <sup>H</sup> <sup>ϕ</sup><sup>i</sup> : <sup>τ</sup><sup>i</sup> *holds for each* <sup>i</sup> ∈ {1,...,n}*, and that* <sup>ϕ</sup><sup>1</sup>,...,ϕn, ϕ *do not contain any fixpoint operators.*

The HES <sup>Φ</sup> = (E, ϕ) represents the HFL**<sup>Z</sup>** formula *toHFL*(E, ϕ) defined inductively by: *toHFL*(, ϕ) = <sup>ϕ</sup> and *toHFL*(E; <sup>X</sup><sup>τ</sup> <sup>=</sup><sup>α</sup> <sup>ϕ</sup> , ϕ) = *toHFL*([αX<sup>τ</sup> .ϕ /X]E, [αX<sup>τ</sup> .ϕ /X]ϕ). Conversely, every HFL**<sup>Z</sup>** formula can be easily converted to an equivalent HES. In the rest of the paper, we often represent an HFL**<sup>Z</sup>** formula in the form of HES, and just call it an HFL**<sup>Z</sup>** formula. We write -Φ for *toHFL*(Φ). An HES (X<sup>τ</sup><sup>1</sup> <sup>1</sup> <sup>=</sup><sup>α</sup><sup>1</sup> <sup>ϕ</sup><sup>1</sup>; ··· ; <sup>X</sup><sup>τ</sup><sup>n</sup> <sup>n</sup> <sup>=</sup><sup>α</sup><sup>n</sup> <sup>ϕ</sup><sup>n</sup>, ϕ) can be normalized to (X<sup>τ</sup><sup>0</sup> <sup>0</sup> <sup>=</sup><sup>ν</sup> <sup>ϕ</sup>; <sup>X</sup><sup>τ</sup><sup>1</sup> <sup>1</sup> <sup>=</sup><sup>α</sup><sup>1</sup> <sup>ϕ</sup><sup>1</sup>; ··· ; <sup>X</sup><sup>τ</sup><sup>n</sup> <sup>n</sup> <sup>=</sup><sup>α</sup><sup>n</sup> <sup>ϕ</sup><sup>n</sup>, X<sup>0</sup>) where <sup>τ</sup><sup>0</sup> is the type of <sup>ϕ</sup>. Thus, we sometimes call just a sequence of equations Xτ0 <sup>0</sup> <sup>=</sup><sup>ν</sup> <sup>ϕ</sup>; <sup>X</sup><sup>τ</sup><sup>1</sup> <sup>1</sup> <sup>=</sup><sup>α</sup><sup>1</sup> <sup>ϕ</sup><sup>1</sup>; ··· ; <sup>X</sup><sup>τ</sup><sup>n</sup> <sup>n</sup> <sup>=</sup><sup>α</sup><sup>n</sup> <sup>ϕ</sup><sup>n</sup> an HES, with the understanding that "the main formula" is the first variable X<sup>0</sup>. Also, we often write <sup>X</sup><sup>τ</sup> <sup>x</sup><sup>1</sup> ··· <sup>x</sup><sup>k</sup> <sup>=</sup><sup>α</sup> <sup>ϕ</sup> for the equation <sup>X</sup><sup>τ</sup> <sup>=</sup><sup>α</sup> λx1. ··· λx<sup>k</sup>.ϕ. We often omit type annotations and just write <sup>X</sup> <sup>=</sup><sup>α</sup> <sup>ϕ</sup> for <sup>X</sup><sup>τ</sup> <sup>=</sup><sup>α</sup> <sup>ϕ</sup>.

*Example 3.* The formula νX.μY.bX∨aY (which means that the current state has a transition sequence of the form (a∗b)<sup>ω</sup>) is expressed as the following HES:

$$((X =\_\nu Y; Y =\_\mu \langle \mathbf{b} \rangle X \lor \langle \mathbf{a} \rangle Y), \quad X).$$

### **3 Warming Up**

To help readers get more familiar with HFL**<sup>Z</sup>** and the idea of reductions, we give here some variations of the examples of verification of file-accessing programs in Sect. 1, which are instances of the "resource usage verification problem" [15]. General reductions will be discussed in Sects. 5, 6 and 7, after the target language is set up in Sect. 4.

Consider the following OCaml-like program, which uses exceptions.

```
let readex x = read x; (if * then () else raise Eof) in
let rec f x = readex x; f x in
```
let d = open\_in "foo" in try f d with Eof -> close d Here, \* represents a non-deterministic boolean value. The function readex reads the file pointer x, and then non-deterministically raises an end-of-file (Eof) exception. The main expression (on the third line) first opens file "foo", calls f to read the file repeatedly, and closes the file upon an end-of-file exception. Suppose, as in the example of Sect. 1, we wish to verify that the file "foo" is accessed following the protocol in Fig. 1.

First, we can remove exceptions by representing an exception handler as a special continuation [6]:

```
let readex x h k = read x; (if * then k() else h()) in
let rec f x h k = readex x h (fun _ -> f x h k) in
let d = open_in "foo" in f d (fun _ -> close d) (fun _ -> ())
```
Here, we have added to each function two parameters h and k, which represent an exception handler and a (normal) continuation respectively.

Let Φ be (E, F **true** (λr.close**true**) (λr.**true**)) where <sup>E</sup> is:

> *Readex* xhk <sup>=</sup><sup>ν</sup> read(k **true** <sup>∧</sup> h **true**); F xhk <sup>=</sup><sup>ν</sup> *Readex* x h (λr.F x h k).

Here, we have just replaced read/close operations with the modal operators read and close, non-deterministic choice with a logical conjunction, and the unit value ( ) with **true**. Then, <sup>L</sup>*file* <sup>|</sup><sup>=</sup> <sup>Φ</sup> if and only if the program performs only valid accesses to the file (e.g., it does not access the file after a close operation), where <sup>L</sup>*file* is the LTS shown in Fig. 1. The correctness of the reduction can be informally understood by observing that there is a close correspondence between reductions of the program and those of the HFL formula above, and when the program reaches a read command read x, the corresponding formula is of the form read···, meaning that the read operation is valid in the current state; a similar condition holds also for close operations. We will present a general translation and prove its correctness in Sect. 6.

Let us consider another example, which uses integers:

```
let rec f y x k = if y=0 then (close x; k())
                  else (read x; f (y-1) x k) in
let d = open_in "foo" in f n d (fun _ -> ())
```
Here, n is an integer constant. The function f reads x y times, and then calls the continuation k. Let L *file* be the LTS obtained by adding to <sup>L</sup>*file* a new state <sup>q</sup><sup>2</sup> and the transition <sup>q</sup><sup>1</sup> end −→ <sup>q</sup><sup>2</sup> (which intuitively means that a program is allowed to terminate in the state q<sup>1</sup>), and let <sup>Φ</sup> be (E ,F n **true** (λr.end**true**)) where E is:

$$F\ y\ x\ k = \mu\ (y = 0 \Rightarrow \langle \mathtt{c1ose} \rangle (k\mathtt{true})) \land (y \neq 0 \Rightarrow \langle \mathtt{read} \rangle (F\ (y - 1)\ x\ k)).$$

Here, p(ϕ<sup>1</sup>,...,ϕ<sup>k</sup>) <sup>⇒</sup> <sup>ϕ</sup> is an abbreviation of <sup>¬</sup>p(ϕ<sup>1</sup>,...,ϕ<sup>k</sup>) <sup>∨</sup> <sup>ϕ</sup>. Then, <sup>L</sup> *file* |= Φ if and only if (i) the program performs only valid accesses to the file, (ii) it eventually terminates, and (iii) the file is closed when the program terminates. Notice the use of μ instead of ν above; by using μ, we can express liveness properties. The property L *file* <sup>|</sup><sup>=</sup> <sup>Φ</sup> indeed holds for <sup>n</sup> <sup>≥</sup> 0, but not for n < 0. In fact, F nxk is equivalent to **false** for n < 0, and read<sup>n</sup>close(k **true**) for n <sup>≥</sup> 0.

### **4 Target Language**

This section sets up, as the target of program verification, a call-by-name<sup>5</sup> higherorder functional language extended with events. The language is essentially the same as the one used by Watanabe et al. [43] for discussing fair non-termination.

### **4.1 Syntax and Typing**

We assume a finite set **Ev** of names called *events*, ranged over by a, and a denumerable set of variables, ranged over by x, y, . . .. Events are used to express temporal properties of programs. We write x- (t, resp.) for a sequence of variables (terms, resp.), and write <sup>|</sup>x-| for the length of the sequence. <sup>A</sup> *program* is a pair (D, t) consisting of a set <sup>D</sup> of function definitions {f<sup>1</sup> <sup>x</sup>t1,...,fn x-

<sup>1</sup> = <sup>n</sup> <sup>=</sup> <sup>t</sup><sup>n</sup>} and a term <sup>t</sup>. The set of *terms*, ranged over by <sup>t</sup>, is defined by:

$$t \colon= \begin{pmatrix} \\ \end{pmatrix} \mid x \mid n \mid t\_1 \text{ op } t\_2 \mid \textbf{event} \; a; t \mid \textbf{if} \; p(t\_1', \dots, t\_k') \text{ then } t\_1 \text{ else } t\_2.$$

Here, n and p range over the sets of integers and integer predicates as in HFL formulas. The expression **event** a;t raises an event a, and then evaluates t. Events are used to encode program properties of interest. For example, an assertion **assert**(b) can be expressed as **if** b **then** ( ) **else** (**event** fail; Ω), where fail is an event that expresses an assertion failure and Ω is a non-terminating term. If program termination is of interest, one can insert "**event** end" to every termination point and check whether an end event occurs. The expression t<sup>1</sup>t2 evaluates <sup>t</sup><sup>1</sup> or <sup>t</sup><sup>2</sup> in a non-deterministic manner; it can be used to model, e.g., unknown inputs from an environment. We use the meta-variable P for programs. When <sup>P</sup> = (D, t) with <sup>D</sup> <sup>=</sup> {f<sup>1</sup> <sup>x</sup>-1 = t1,...,fn x<sup>n</sup> <sup>=</sup> <sup>t</sup><sup>n</sup>}, we write **funs**(P) for {f<sup>1</sup>,...,f<sup>n</sup>} (i.e., the set of function names defined in P). Using λ-abstractions, we sometimes write f <sup>=</sup> λx.t for the function definition f x- <sup>=</sup> t. We also regard D as a map from function names to terms, and write *dom*(D) for {f<sup>1</sup>,...,f<sup>n</sup>} and D(f<sup>i</sup>) for λxi.ti.

Any program (D, t) can be normalized to (D ∪ {**main** <sup>=</sup> t}, **main**) where **main** is a name for the "main" function. We sometimes write just D for a program (D, **main**), with the understanding that D contains a definition of **main**.

We restrict the syntax of expressions using a type system. The set of *simple types*, ranged over by κ, is defined by:

$$\kappa ::= \star \mid \eta \to \kappa \qquad\qquad \eta ::= \kappa \mid \mathsf{int}.$$

The types , int, and η <sup>→</sup> κ describe the unit value, integers, and functions from η to κ respectively. Note that int is allowed to occur only in argument

<sup>5</sup> Call-by-value programs can be handled by applying the CPS transformation before applying the reductions to HFL model checking.

positions. We defer typing rules to [23], as they are standard, except that we require that the righthand side of each function definition must have type ; this restriction, as well as the restriction that int occurs only in argument positions, does not lose generality, as those conditions can be ensured by applying CPS transformation. We consider below only well-typed programs.

#### **4.2 Operational Semantics**

We define the labeled transition relation <sup>t</sup> −→<sup>D</sup> <sup>t</sup> , where is either or an event name, as the least relation closed under the rules in Fig. 3. We implicitly assume that the program (D, t) is well-typed, and this assumption is maintained throughout reductions by the standard type preservation property. In the rules for if-expressions, t <sup>i</sup> represents the integer value denoted by t <sup>i</sup>; note that the well-typedness of (D, t) guarantees that t <sup>i</sup> must be arithmetic expressions consisting of integers and integer operations; thus, t <sup>i</sup> is well defined. We often omit the subscript <sup>D</sup> when it is clear from the context. We write <sup>t</sup> 1···<sup>k</sup> −→ <sup>∗</sup> <sup>D</sup> t if <sup>t</sup> <sup>1</sup> −→<sup>D</sup> ··· <sup>k</sup> −→<sup>D</sup> <sup>t</sup> . Here, is treated as an empty sequence; thus, for example, we write t ab −→<sup>∗</sup> <sup>D</sup> t if <sup>t</sup> <sup>a</sup>−→<sup>D</sup> −→<sup>D</sup> <sup>b</sup> −→<sup>D</sup> −→<sup>D</sup> <sup>t</sup> .

$$\begin{array}{cc} \textbf{Event} \ a; t \stackrel{a}{\longrightarrow}\_{D} t & \frac{f\widetilde{\boldsymbol{x}} = \boldsymbol{u} \in \boldsymbol{D} \qquad | \widetilde{\boldsymbol{x}}| = |\widetilde{\boldsymbol{t}}|}{f \ \widetilde{\boldsymbol{t}} \stackrel{\boldsymbol{\epsilon}}{\longrightarrow}\_{D} [\widetilde{\boldsymbol{t}}/\widetilde{\boldsymbol{x}}] \boldsymbol{u}} & \frac{([\boldsymbol{t}\_{1}^{\prime}], \ldots, \underbrace{[\boldsymbol{t}\_{k}^{\prime}]}) \in [\boldsymbol{p}]}{\textbf{if} \ \boldsymbol{p}(t\_{1}^{\prime}, \ldots, \boldsymbol{t}\_{k}^{\prime}) \ \textbf{then} \ t\_{1} \ \mathtt{else} \ t\_{2} \stackrel{\boldsymbol{\epsilon}}{\longrightarrow}\_{D} t\_{1}} \\\\ & \frac{\boldsymbol{i} \in \{1, 2\}}{t\_{1} \Box t\_{2} \stackrel{\boldsymbol{\epsilon}}{\longrightarrow}\_{D} t\_{i}} & \frac{([\boldsymbol{t}\_{1}^{\prime}], \ldots, [\boldsymbol{t}\_{k}^{\prime}]) \notin [\boldsymbol{p}]}{\textbf{if} \ \boldsymbol{p}(t\_{1}^{\prime}, \ldots, \boldsymbol{t}\_{k}^{\prime}) \ \textbf{then} \ t\_{1} \ \mathtt{else} \ t\_{2} \stackrel{\boldsymbol{\epsilon}}{\longrightarrow}\_{D} t\_{2}} \end{array}$$

**Fig. 3.** Labeled transition semantics

For a program P = (D, t<sup>0</sup>), we define the set **Traces**(P)(<sup>⊆</sup> **Ev**<sup>∗</sup> <sup>∪</sup> **Ev**<sup>ω</sup>) of *traces* by:

$$\begin{split} \mathbf{Taxes}(D, t\_0) &= \{ \ell\_0 \cdots \ell\_{n-1} \in (\{\epsilon\} \cup \mathbf{Ev})^\* \mid \forall i \in \{0, \ldots, n-1\}. t\_i \stackrel{\ell\_i}{\longrightarrow}\_D t\_{i+1} \} \\ &\cup \{ \ell\_0 \ell\_1 \cdots \in (\{\epsilon\} \cup \mathbf{Ev})^{\omega} \mid \forall i \in \omega. t\_i \stackrel{\ell\_i}{\longrightarrow}\_D t\_{i+1} \}. \end{split}$$

Note that since the label is regarded as an empty sequence, <sup>0</sup><sup>1</sup><sup>2</sup> <sup>=</sup> aa if <sup>0</sup> <sup>=</sup> <sup>2</sup> <sup>=</sup> <sup>a</sup> and <sup>1</sup> <sup>=</sup> , and an element of ({} ∪ **Ev**)<sup>ω</sup> is regarded as that of **Ev**<sup>∗</sup> <sup>∪</sup> **Ev**<sup>ω</sup>. We write **FinTraces**(P) and **InfTraces**(P) for **Traces**(P) <sup>∩</sup> **Ev**<sup>∗</sup> and **Traces**(P) <sup>∩</sup> **Ev**<sup>ω</sup> respectively. The set of *full traces* **FullTraces**(D, t<sup>0</sup>)(<sup>⊆</sup> **Ev**<sup>∗</sup> <sup>∪</sup> **Ev**<sup>ω</sup>) is defined as:

$$\begin{split} \{\ell\_0 \cdots \ell\_{n-1} \in (\{\epsilon\} \cup \mathbf{Ev})^\* \mid t\_n = (\ ) \land \forall i \in \{0, \ldots, n-1\}. t\_i \xrightarrow{\ell\_i}\_{D} t\_{i+1} \} \\ \cup \{\ell\_0 \ell\_1 \cdots \in (\{\epsilon\} \cup \mathbf{Ev})^\omega \mid \forall i \in \omega. t\_i \xrightarrow{\ell\_i}\_{D} t\_{i+1} \}. \end{split}$$

*Example 4.* The last example in Sect. <sup>1</sup> is modeled as <sup>P</sup>*file* = (D, f ( )), where D <sup>=</sup> {f x = (**event** close; ( ))-(**event** read; **event** read; f x)}. We have:

**Traces**(P) = {read<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}∪{read<sup>2</sup>nclose <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}∪{readω} **FinTraces**(P) = {read<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}∪{read<sup>2</sup>nclose <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>} **InfTraces**(P) = {readω} **FullTraces**(P) = {read<sup>2</sup>nclose <sup>|</sup> n <sup>≥</sup> <sup>0</sup>}∪{readω}.

### **5 May/Must-Reachability Verification**

Here we consider the following problems:


Since we are interested in a particular event a, we restrict here the event set **Ev** to a singleton set of the form {a}. Then, the may-reachability is formalized as a ? <sup>∈</sup> **Traces**(P), whereas the must-reachability is formalized as "does every trace in **FullTraces**(P) contain a?" We encode both problems into the validity of HFL**<sup>Z</sup>** formulas (without any modal operators <sup>a</sup> or [a]), or the HFL**<sup>Z</sup>** model checking of those formulas against a trivial model (which consists of a single state without any transitions). Since our reductions are sound and complete, the characterizations of their negations –non-reachability and may-non-reachability– can also be obtained immediately. Although these are the simplest classes of properties among those discussed in Sects. 5, 6 and 7, they are already large enough to accommodate many program properties discussed in the literature, including lack of assertion failures/uncaught exceptions [22] (which can be characterized as non-reachability; recall the encoding of assertions in Sect. 4), termination [27,29] (characterized as must-reachability), and non-termination [26] (characterized as may-non-reachability).

### **5.1 May-Reachability**

As in the examples in Sect. 3, we translate a program to a formula that says "the program may raise an event a" in a compositional manner. For example, **event** a;t can be translated to **true** (since the event will surely be raised immediately), and t<sup>1</sup><sup>t</sup><sup>2</sup> can be translated to <sup>t</sup> † <sup>1</sup> <sup>∨</sup> <sup>t</sup> † <sup>2</sup> where <sup>t</sup> † <sup>i</sup> is the result of the translation of <sup>t</sup><sup>i</sup> (since only one of <sup>t</sup><sup>1</sup> and <sup>t</sup><sup>2</sup> needs to raise an event).

**Definition 3.** *Let* <sup>P</sup> = (D, t) *be a program.* <sup>Φ</sup>P,*may is the HES* (D†*may* , t†*may* )*, where* D†*may and* <sup>t</sup> †*may are defined by:* {f1 x-1 = t1,...,fn x-†*may* = f1 x-†*may* ; ··· ; fn xn =μ tn†*may*

$$\begin{array}{ll} \left( \begin{array}{l} f\_{1} \ \widetilde{x}\_{1} = t\_{1}, \ldots, f\_{n} \ \widetilde{x}\_{n} = t\_{n} \end{array} \right)^{\dagger\_{\text{may}}} = \left( f\_{1} \ \widetilde{x}\_{1} = \mu\_{1} \, t\_{1}^{\dagger\_{\text{may}}}; \cdots; f\_{n} \ \widetilde{x}\_{n} = \mu\_{n} \, t\_{n}^{\dagger\_{\text{may}}} \right) \\ \left( \begin{array}{l} \left( \right)^{\dagger\_{\text{may}}} = \text{false} \ \qquad x^{\dagger\_{\text{may}}} = x \ \qquad n^{\dagger\_{\text{may}}} = n \ \qquad \left( t\_{1} \ \text{op} \ t\_{2} \right)^{\dagger\_{\text{may}}} = t\_{1}^{\dagger\_{\text{may}}} \ \text{op} \ t\_{2} \, t^{\dagger\_{\text{may}}} \\ \text{(if } \ p(t\_{1}^{\dagger}, \ldots, t\_{k}^{\dagger}) \ \text{then } t\_{1} \ \text{else } t\_{2} \end{array} \right) = \\ \left( p(t\_{1}^{\dagger} \, ^{\dagger\_{\text{may}}}, \ldots, t\_{k}^{\dagger} \ ^{\dagger\_{\text{may}}}) \land t\_{1}^{\dagger\_{\text{may}}} \right) \vee \left( \neg p(t\_{1}^{\dagger} \, ^{\dagger\_{\text{may}}}, \ldots, t\_{k}^{\dagger} \ ^{\dagger\_{\text{may}}}) \wedge t\_{2}^{\dagger\_{\text{may}}} \right) \\ \left( \begin{array}{l} \mathbf{e} \text{'} \ \mathbf{t} \text{ or} \ \mathbf{t} \end{array} \right) \end{array}$$

Note that, in the definition of D†*may* , the order of function definitions in D does not matter (i.e., the resulting HES is unique up to the semantic equality), since all the fixpoint variables are bound by μ.

*Example 5.* Consider the program:

$$P\_{loop} = (\{loop\ x = loop\ x\}, loop(\textbf{event\ a}; ())).$$

It is translated to the HES <sup>Φ</sup>*loop* = (*loop* <sup>x</sup> <sup>=</sup><sup>μ</sup> *loop* x, *loop*(**true**)). Since *loop* <sup>≡</sup> <sup>μ</sup>*loop*.λx.*loop* <sup>x</sup> is equivalent to λx.**false**, <sup>Φ</sup>*loop* is equivalent to **false**. In fact, <sup>P</sup>*loop* never raises an event <sup>a</sup> (recall that our language is call-by-name).

*Example 6.* Consider the program <sup>P</sup>*sum* = (D*sum*, **main**) where <sup>D</sup>*sum* is:

$$\begin{array}{l} \textbf{main} = \textit{sum} \ n \ (\lambda r. \textbf{assert}(r \ge n)) \\\quad \textit{sum } x \ k = \textbf{if} \ x = 0 \ \textbf{then} \ k \ 0 \ \textbf{else} \ sum \ (x - 1) \ (\lambda r. k(x + r)) \end{array}$$

Here, n is some integer constant, and **assert**(b) is the macro introduced in Sect. 4. We have used λ-abstractions for the sake of readability. The function *sum* is a CPS version of a function that computes the summation of integers from 1 to x. The main function computes the sum r =1+ ··· <sup>+</sup> n, and asserts r <sup>≥</sup> n. It is translated to the HES <sup>Φ</sup><sup>P</sup>2,*may* = (E*sum*, **main**) where <sup>E</sup>*sum* is:

\*\*main =\*\*\_
um  $n$  ( $\lambda r. ( $r \ge n \land \mathtt{false}$ ) \lor ( $r < n \land \mathtt{true}$ ));

sum x  $k =\_\mu (x = 0 \land k \, 0) \lor (x \ne 0 \land \displaystyle sum \; (x - 1) \; (\lambda r. k(x + r))).$ $ 

Here, n is treated as a constant. Since the shape of the formula does not depend on the value of n, the property "an assertion failure may occur for some n" can be expressed by <sup>∃</sup>n.Φ<sup>P</sup>2,*may*.

The following theorem states that <sup>Φ</sup>P,*may* is a complete characterization of the may-reachability of P.

**Theorem 1.** *Let* <sup>P</sup> *be a program. Then,* <sup>a</sup> <sup>∈</sup> **Traces**(P) *if and only if* <sup>L</sup><sup>0</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>P,*may for* <sup>L</sup><sup>0</sup> = ({<sup>s</sup> }, <sup>∅</sup>, <sup>∅</sup>, <sup>s</sup> )*.*

A proof of the theorem above is found in [23]. We only provide an outline. We first show the theorem for recursion-free programs and then lift it to arbitrary programs by using the continuity of functions represented in the fixpoint-free fragment of HFL**<sup>Z</sup>** formulas. To show the theorem for recursion-free programs, we define the reduction relation <sup>t</sup> −→<sup>D</sup> <sup>t</sup> by: fx- <sup>=</sup> u <sup>∈</sup> D <sup>|</sup>x-| = |-

and by using one continuity of n functions represented in one xpom-neent of HFLz formulas. To show the theorem for recursion-free programs define the reduction relation  $t \longrightarrow\_D t'$  by:

 $\frac{f\widetilde{x} = u \in D \qquad |\widetilde{x}| = |\widetilde{t}|}{E[f\,\widetilde{t}] \longrightarrow\_D E[\widetilde{[t/\widetilde{x}]u]}]} \quad \frac{([t'\_1], \ldots, [t'\_k]) \in [p]}{E[\widetilde{\text{if}}\, p(t'\_1, \ldots, t'\_k) \text{ then } t\_1 \text{ else } t\_2] \longrightarrow\_D E[t\_1]}$ 
 $\frac{([t'\_1], \ldots, [t'\_k]) \notin [p]}{(t'\_1 = t'\_1) \downarrow 1 \quad \star \quad 1 \quad \star \quad 1 \quad \star \quad 1 \quad \neg \quad 1}$ 

E[**if** p(t <sup>1</sup>,...,t <sup>k</sup>) **then** <sup>t</sup><sup>1</sup> **else** <sup>t</sup><sup>2</sup>] −→<sup>D</sup> <sup>E</sup>[t<sup>2</sup>]

Here, E ranges over the set of evaluation contexts given by E:: = [ ] <sup>|</sup> Et <sup>|</sup> t-E <sup>|</sup> **event** a; E. The reduction relation differs from the labeled transition relation given in Sect. 4, in that and **event** a; ··· are not eliminated. By the definition of the translation, the theorem holds for programs in normal form (with respect to the reduction relation), and the semantics of translated HFL formulas is preserved by the reduction relation; thus the theorem holds for recursion-free programs, as they are strongly normalizing.

### **5.2 Must-Reachability**

The characterization of must-reachability can be obtained by an easy modification of the characterization of may-reachability: we just need to replace branches with logical conjunction.

**Definition 4.** *Let* <sup>P</sup> = (D, t) *be a program.* <sup>Φ</sup>P,*must is the HES* (D†*must*, t†*must*)*, where* D†*must and* <sup>t</sup> †*must are defined by:* {f1 x-1 = t1,...,fn x-†*must* = f1 x-†*must*; ··· ; fn xn =μ tn†*must*

<sup>n</sup> <sup>=</sup> <sup>t</sup><sup>n</sup>} <sup>1</sup> <sup>=</sup><sup>μ</sup> <sup>t</sup><sup>1</sup> ( )†*must* <sup>=</sup> **false** <sup>x</sup>†*must* <sup>=</sup> x n†*must* <sup>=</sup> <sup>n</sup> (t<sup>1</sup> op <sup>t</sup><sup>2</sup>) †*must* <sup>=</sup> <sup>t</sup><sup>1</sup> †*must* op <sup>t</sup><sup>2</sup> †*must* (**if** p(t <sup>1</sup>,...,t <sup>k</sup>) **then** <sup>t</sup><sup>1</sup> **else** <sup>t</sup><sup>2</sup>) †*must* = (p(t 1 †*must*,...,t k †*must*) <sup>⇒</sup> <sup>t</sup><sup>1</sup> †*must*) <sup>∧</sup> (¬p(t 1 †*must*,...,t k †*must*) <sup>⇒</sup> <sup>t</sup><sup>2</sup> †*must*) (**event** a;t) †*must* <sup>=</sup> **true** (t1t<sup>2</sup>) †*must* <sup>=</sup> <sup>t</sup><sup>1</sup> †*must*t<sup>2</sup> †*must* (t<sup>1</sup>t2) †*must* <sup>=</sup> <sup>t</sup><sup>1</sup> †*must* <sup>∧</sup> <sup>t</sup><sup>2</sup> †*must*.

*Here,* p(ϕ1,...,ϕ<sup>k</sup>) <sup>⇒</sup> <sup>ϕ</sup> *is a shorthand for* <sup>¬</sup>p(ϕ1,...,ϕ<sup>k</sup>) <sup>∨</sup> <sup>ϕ</sup>*.*

*Example 7.* Consider <sup>P</sup>loop = (D, loop m n) where <sup>D</sup> is:

$$\begin{array}{l} \textbf{1oop} \ x \ y = \textbf{if} \ x \leq 0 \lor y \leq 0 \ \textbf{then} \ (\textbf{event} \ \textbf{end}; \ ()) \\\quad \texttt{else} \ (\textbf{1oop} \ (x-1) \ (y\*y)) \square (\textbf{1oop} \ x \ (y-1)) \end{array}$$

Here, the event end is used to signal the termination of the program. The function loop non-deterministically updates the values of x and y until either x or y becomes non-positive. The must-termination of the program is characterized by <sup>Φ</sup><sup>P</sup>loop,*must* = (E, loop m n) where <sup>E</sup> is:

$$\begin{array}{c} \mathsf{Top}\ x\ y =\_{\mu} (x \leq 0 \lor y \leq 0 \Rightarrow \mathsf{true})\\ \land (\neg(x \leq 0 \lor y \leq 0) \Rightarrow (\mathsf{Loop}\ (x-1)\ (y\*y)) \land (\mathsf{Loop}\ x\ (y-1))). \end{array}$$

We write **Must**a(P) if every <sup>π</sup> <sup>∈</sup> **FullTraces**(P) contains <sup>a</sup>. The following theorem, which can be proved in a manner similar to Theorem 1, guarantees that <sup>Φ</sup>P,*must* is indeed a sound and complete characterization of the must-reachability.

**Theorem 2.** *Let* <sup>P</sup> *be a program. Then, Must*a(P) *if and only if* <sup>L</sup><sup>0</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>P,*must for* <sup>L</sup><sup>0</sup> = ({<sup>s</sup> }, <sup>∅</sup>, <sup>∅</sup>, <sup>s</sup> )*.*

### **6 Trace Properties**

Here we consider the verification problem: "Given a (non-ω) regular language L and a program P, does *every* finite event sequence of P belong to L? (i.e. **FinTraces**(P) ? <sup>⊆</sup> <sup>L</sup>)" and reduce it to an HFL**<sup>Z</sup>** model checking problem. The verification of file-accessing programs considered in Sect. 3 may be considered an instance of the problem.

Here we assume that the language L is closed under the prefix operation; this does not lose generality because **FinTraces**(P) is also closed under the prefix operation. We write <sup>A</sup><sup>L</sup> = (Q, Σ, δ, q0, F) for the minimal, deterministic automaton with no dead states (hence the transition function δ may be partial). Since <sup>L</sup> is prefix-closed and the automaton is minimal, <sup>w</sup> <sup>∈</sup> <sup>L</sup> if and only if ˆδ(q0, w) is defined (where <sup>ˆ</sup>δ is defined by: <sup>ˆ</sup>δ(q, ) = q and <sup>ˆ</sup>δ(q, aw) = <sup>ˆ</sup>δ(δ(q, a), w)). We use the corresponding LTS <sup>L</sup><sup>L</sup> = (Q, Σ, {(q, a, q ) <sup>|</sup> δ(q, a) = q }, q<sup>0</sup>) as the model of the reduced HFL**<sup>Z</sup>** model checking problem.

Given the LTS <sup>L</sup><sup>L</sup> above, whether an event sequence <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>k</sup> belongs to <sup>L</sup> can be expressed as <sup>L</sup><sup>L</sup> ? |= a<sup>1</sup>···a<sup>k</sup>**true**. Whether all the event sequences in {aj,<sup>1</sup> ··· <sup>a</sup>j,k<sup>j</sup> <sup>|</sup> <sup>j</sup> ∈ {1,...,n}} belong to <sup>L</sup> can be expressed as <sup>L</sup><sup>L</sup> ? <sup>|</sup><sup>=</sup> <sup>j</sup>∈{1,...,n}aj,<sup>1</sup>···<sup>a</sup>j,k<sup>j</sup> **true**. We can lift these translations for event sequences to the translation from a program (which can be considered a description of a set of event sequences) to an HFL**<sup>Z</sup>** formula, as follows. †*path* = n =ν tn†*path*

**Definition 5.** *Let* <sup>P</sup> = (D, t) *be a program.* <sup>Φ</sup>P,*path is the HES* (D†*path* , t†*path* )*, where* D†*path and* t †*path are defined by:* {f1 x-1 = t1,...,fn xf1 x-†*path* ; ··· ; fn x-

<sup>n</sup> <sup>=</sup> <sup>t</sup><sup>n</sup>} <sup>1</sup> <sup>=</sup><sup>ν</sup> <sup>t</sup><sup>1</sup> ( )†*path* <sup>=</sup> **true** <sup>x</sup>†*path* <sup>=</sup> x n†*path* <sup>=</sup> <sup>n</sup> (t<sup>1</sup> op <sup>t</sup><sup>2</sup>) †*path* <sup>=</sup> <sup>t</sup><sup>1</sup> †*path* op <sup>t</sup><sup>2</sup> †*path* (**if** p(t <sup>1</sup>,...,t <sup>k</sup>) **then** <sup>t</sup><sup>1</sup> **else** <sup>t</sup><sup>2</sup>) †*path* = (p(t 1 †*path* ,...,t k †*path* ) <sup>⇒</sup> <sup>t</sup><sup>1</sup> †*path* ) <sup>∧</sup> (¬p(t 1 †*path* ,...,t k †*path* ) <sup>⇒</sup> <sup>t</sup><sup>2</sup> †*path* ) (**event** a;t) †*path* = at †*path* (t<sup>1</sup>t<sup>2</sup>) †*path* <sup>=</sup> <sup>t</sup><sup>1</sup> †*path* <sup>t</sup><sup>2</sup> †*path* (t<sup>1</sup>t2) †*path* <sup>=</sup> <sup>t</sup><sup>1</sup> †*path* <sup>∧</sup> <sup>t</sup><sup>2</sup> †*path* .

*Example 8.* The last program discussed in Sect. <sup>3</sup> is modeled as <sup>P</sup><sup>2</sup> <sup>=</sup> (D<sup>2</sup>,f mg), where <sup>m</sup> is an integer constant and <sup>D</sup><sup>2</sup> consists of:

fyk <sup>=</sup> **if** y = 0 **then** (**event** close; k ( )) **else** (**event** read; f (y <sup>−</sup> 1) k) g r <sup>=</sup> **event** end;()

Here, we have modeled accesses to the file, and termination as events. Then, <sup>Φ</sup><sup>P</sup>2,*path* = (E<sup>P</sup>2,*path*,f mg) where <sup>E</sup><sup>P</sup>2,*path* is:<sup>6</sup>

 $f \ n \ k =\_{\nu} (n = 0 \Rightarrow \langle \mathtt{c1ose} \rangle (k \,\mathtt{true})) \land (n \neq 0 \Rightarrow \langle \mathtt{read} \rangle (f \ (n - 1) \,\mathtt{k})) \land \neg \langle \mathtt{end} \rangle \mathtt{true}.$  $g \ r =\_{\nu} \langle \mathtt{end} \rangle \mathtt{true}.$ 

Let <sup>L</sup> be the prefix-closure of read<sup>∗</sup> · close · end. Then <sup>L</sup><sup>L</sup> is <sup>L</sup> *file* in Sect. 3, and **FinTraces**(P<sup>2</sup>)⊆L can be verified by checking L<sup>L</sup>|=Φ<sup>P</sup>2,*path*.

<sup>6</sup> Unlike in Sect. 3, the variables are bound by ν since we are not concerned with the termination property here.

**Theorem 3.** *Let* P *be a program and* L *be a regular, prefix-closed language. Then,* **FinTraces**(P) <sup>⊆</sup> <sup>L</sup> *if and only if* <sup>L</sup><sup>L</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>P,*path.*

As in Sect. 5, we first prove the theorem for programs in normal form, and then lift it to recursion-free programs by using the preservation of the semantics of HFL**<sup>Z</sup>** formulas by reductions, and further to arbitrary programs by using the (co-)continuity of the functions represented by fixpoint-free HFL**<sup>Z</sup>** formulas. See [23] for a concrete proof.

### **7 Linear-Time Temporal Properties**

This section considers the following problem: "Given a program P and an ωregular word language L, does **InfTraces**(P)∩L <sup>=</sup> <sup>∅</sup> hold?". From the viewpoint of program verification, L represents the set of "bad" behaviors. This can be considered an extension of the problems considered in the previous sections.

The reduction to HFL model checking is more involved than those in the previous sections. To see the difficulty, consider the program P<sup>0</sup>:

$$(\{f = \text{if } c \text{ then } (\text{event } \mathbf{a}; f) \text{ else } (\text{event } \mathbf{b}; f)\}, \quad f),$$

where <sup>c</sup> is some boolean expression. Let <sup>L</sup> be the complement of (a∗b)<sup>ω</sup>, i.e., the set of infinite sequences that contain only finitely many b's. Following Sect. <sup>6</sup> (and noting that **InfTraces**(P)∩L <sup>=</sup> <sup>∅</sup> is equivalent to **InfTraces**(P) <sup>⊆</sup> (a∗b)<sup>ω</sup> in this case), one may be tempted to prepare an LTS like the one in Fig. 4 (which corresponds to the transition function of a (parity) word automaton accepting (a∗b)<sup>ω</sup>), and translate the program to an HES <sup>Φ</sup><sup>P</sup><sup>0</sup> of the form:

$$(f =\_{\alpha} (c \Rightarrow \langle \mathbf{a} \rangle f) \land (\neg c \Rightarrow \langle \mathbf{b} \rangle f), \quad f),$$

where α is μ or ν. However, such a translation would not work. If c <sup>=</sup> **true**, then **InfTraces**(P<sup>0</sup>) = a<sup>ω</sup>, hence **InfTraces**(P<sup>0</sup>) <sup>∩</sup> L <sup>=</sup> <sup>∅</sup>; thus, α should be μ for <sup>Φ</sup><sup>P</sup><sup>0</sup> to be unsatisfied. If <sup>c</sup> <sup>=</sup> **false**, however, **InfTraces**(P<sup>0</sup>) = <sup>b</sup><sup>ω</sup>, hence **InfTraces**(P<sup>0</sup>) <sup>∩</sup> <sup>L</sup> <sup>=</sup> <sup>∅</sup>; thus, <sup>α</sup> must be <sup>ν</sup> for <sup>Φ</sup><sup>P</sup><sup>0</sup> to be satisfied.

**Fig. 4.** LTS for (a<sup>∗</sup>b) ω

The example above suggests that we actually need to distinguish between the two occurrences of f in the body of f's definition. Note that in the then- and else-clauses respectively, f is called after different events a and b. This difference is important, since we are interested in whether b occurs infinitely often. We thus duplicate f, and replace the program with the following program P*dup*:

> ({f<sup>b</sup> <sup>=</sup> **if** <sup>c</sup> **then** (**event** <sup>a</sup>; <sup>f</sup>a) **else** (**event** <sup>b</sup>; <sup>f</sup>b), <sup>f</sup><sup>a</sup> <sup>=</sup> **if** <sup>c</sup> **then** (**event** <sup>a</sup>; <sup>f</sup>a) **else** (**event** <sup>b</sup>; <sup>f</sup>b)}, fb).

For checking **InfTraces**(P<sup>0</sup>) <sup>∩</sup> <sup>L</sup> <sup>=</sup> <sup>∅</sup>, it is now sufficient to check that <sup>f</sup><sup>b</sup> is recursively called infinitely often. We can thus obtain the following HES:

$$((f\_b =\_\nu (c \Rightarrow \langle \mathbf{a} \rangle f\_a) \land (\neg c \Rightarrow \langle \mathbf{b} \rangle f\_b); \quad f\_a =\_\mu (c \Rightarrow \langle \mathbf{a} \rangle f\_a) \land (\neg c \Rightarrow \langle \mathbf{b} \rangle f\_b)), f\_b).$$

Note that <sup>f</sup><sup>b</sup> and <sup>f</sup><sup>a</sup> are bound by <sup>ν</sup> and <sup>μ</sup> respectively, reflecting the fact that b should occur infinitely often, but a need not. If c <sup>=</sup> **true**, the formula is equivalent to νf<sup>b</sup>.aμf<sup>a</sup>.af<sup>a</sup>, which is false. If c <sup>=</sup> **false**, then the formula is equivalent to νf<sup>b</sup>.bf<sup>b</sup>, which is satisfied by by the LTS in Fig. 4.

The general translation is more involved due to the presence of higher-order functions, but, as in the example above, the overall translation consists of two steps. We first replicate functions according to what events may occur between two recursive calls, and reduce the problem **InfTraces**(P) <sup>∩</sup>L ? = ∅ to a problem of analyzing which functions are recursively called infinitely often, which we call a *call-sequence analysis*. We can then reduce the call-sequence analysis to HFL model checking in a rather straightforward manner (though the proof of the correctness is non-trivial). The resulting HFL formula actually does not contain modal operators.<sup>7</sup> So, as in Sect. 5, the resulting problem is the validity checking of HFL formulas without modal operators.

In the rest of this section, we first introduce the call-sequence analysis problem and its reduction to HFL model checking in Sect. 7.1. We then show how to reduce the temporal verification problem **InfTraces**(P) <sup>∩</sup> L ? = ∅ to an instance of the call-sequence analysis problem in Sect. 7.2.

### **7.1 Call-Sequence Analysis**

We define the call-sequence analysis and reduce it to an HFL model-checking problem. As mentioned above, in the call-sequence analysis, we are interested in analyzing which functions are *recursively called* infinitely often. Here, we say that g is *recursively called from* f, if f s- −→D [s/ x--D g t, where f x-

]tf −→<sup>∗</sup> <sup>=</sup> <sup>t</sup><sup>f</sup> <sup>∈</sup> <sup>D</sup> and <sup>g</sup> "originates from" <sup>t</sup><sup>f</sup> (a more formal definition will be given in Definition <sup>6</sup> below). For example, consider the following program P*app*, which is a twisted version of <sup>P</sup>*dup* above.

({app h x <sup>=</sup> h x, <sup>f</sup><sup>b</sup> <sup>x</sup> <sup>=</sup> **if** x > <sup>0</sup> **then** (**event** <sup>a</sup>; app <sup>f</sup><sup>a</sup> (<sup>x</sup> <sup>−</sup> 1)) **else** (**event** <sup>b</sup>; app <sup>f</sup><sup>b</sup> 5), <sup>f</sup><sup>a</sup> <sup>x</sup> <sup>=</sup> **if** x > <sup>0</sup> **then** (**event** <sup>a</sup>; app <sup>f</sup><sup>a</sup> (<sup>x</sup> <sup>−</sup> 1)) **else** (**event** <sup>b</sup>; app <sup>f</sup><sup>b</sup> 5)}, f<sup>b</sup> 5).

<sup>7</sup> In the example above, we can actually remove a and b, as information about events has been taken into account when f was duplicated.

Then <sup>f</sup><sup>a</sup> is "recursively called" from <sup>f</sup><sup>b</sup> in <sup>f</sup><sup>b</sup> <sup>5</sup> <sup>a</sup> −→<sup>∗</sup> <sup>D</sup> app <sup>f</sup><sup>a</sup> <sup>4</sup> −→<sup>∗</sup> <sup>D</sup> <sup>f</sup><sup>a</sup> 4 (and so is app). We are interested in infinite chains of recursive calls <sup>f</sup><sup>0</sup>f<sup>1</sup>f<sup>2</sup> ···, and which functions may occur infinitely often in each chain. For instance, the program above has the unique infinite chain (fbf <sup>5</sup> <sup>a</sup> )ω, in which both <sup>f</sup><sup>a</sup> and <sup>f</sup><sup>b</sup> occur infinitely often. (Besides the infinite chain, the program has finite chains like <sup>f</sup><sup>b</sup> app; note that the chain cannot be extended further, as the body of app does not have any occurrence of recursive functions: app, f<sup>a</sup> and <sup>f</sup>b.) **Definition 6 (Recursive call relation, call sequences).** *Let* <sup>P</sup> = (D, f<sup>1</sup> <sup>s</sup>-

We define the notion of "recursive calls" and call-sequences formally below.

) *be a program, with* <sup>D</sup> <sup>=</sup> {f<sup>i</sup> <sup>x</sup>˜<sup>i</sup> <sup>=</sup> <sup>u</sup><sup>i</sup>}<sup>1</sup>≤i≤<sup>n</sup>*. We define* <sup>D</sup> := D∪{f <sup>i</sup> <sup>x</sup>˜ <sup>=</sup> <sup>u</sup><sup>i</sup>}<sup>1</sup>≤i≤<sup>n</sup> *where* f <sup>1</sup>,...,f <sup>n</sup> *are fresh symbols. (Thus,* <sup>D</sup> *has two copies of each function symbol, one of which is marked by .) For the terms* ti *and* <sup>t</sup><sup>j</sup> *that do not contain marked symbols, we write* <sup>f</sup><sup>i</sup> tiDfj tj *if (i)* [ti/xi][f <sup>1</sup>/f1,...,f <sup>n</sup>/f<sup>n</sup>]u<sup>i</sup> - −→<sup>∗</sup> D f j t j *and (ii)* <sup>t</sup><sup>j</sup> *is obtained by erasing all the marks in* t <sup>j</sup> *. We write* **Callseq**(P) *for the set of (possibly infinite) sequences of function symbols:* {f<sup>1</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> ···| <sup>f</sup><sup>1</sup> <sup>s</sup>-Dg1 t1Dg2 -

$$\{f\_1 \, g\_1 \, g\_2 \cdots \mid f\_1 \, \widetilde{s} \leadsto\_D g\_1 \, \widetilde{t}\_1 \leadsto\_D g\_2 \, \widetilde{t}\_2 \leadsto\_D \dotsb \}.$$

*We write* **InfCallseq**(P) *for the subset of* **Callseq**(P) *consisting of infinite sequences, i.e.,* **Callseq**(P) ∩ {f1,...,f<sup>n</sup>}<sup>ω</sup>*.*

For example, for <sup>P</sup>*app* above, **Callseq**(P) is the prefix closure of {(f<sup>b</sup><sup>f</sup> <sup>5</sup> <sup>a</sup> )<sup>ω</sup>} ∪ {s · app <sup>|</sup> s is a non-empty finite prefix of (f<sup>b</sup><sup>f</sup> <sup>5</sup> <sup>a</sup> )<sup>ω</sup>}, and **InfCallseq**(P) is the singleton set {(f<sup>b</sup>f <sup>5</sup> <sup>a</sup> )<sup>ω</sup>}.

**Definition 7 (Call-sequence analysis).** *A* priority assignment *for a program* P *is a function* Ω : **funs**(P) <sup>→</sup> <sup>N</sup> *from the set of function symbols of* P *to the set* <sup>N</sup> *of natural numbers. We write* <sup>|</sup>=*csa* (P, Ω) *if every infinite callsequence* <sup>g</sup><sup>0</sup>g<sup>1</sup>g<sup>2</sup> ···∈ **InfCallseq**(P) *satisfies the parity condition w.r.t.* <sup>Ω</sup>*, i.e., the largest number occurring infinitely often in* Ω(g<sup>0</sup>)Ω(g<sup>1</sup>)Ω(g<sup>2</sup>)... *is even.* Call-sequence analysis *is the problem of, given a program* P *with a priority assignment* <sup>Ω</sup>*, deciding whether* <sup>|</sup>=*csa* (P, Ω) *holds.*

For example, for <sup>P</sup>*app* and the priority assignment <sup>Ω</sup>*app* <sup>=</sup> {app → <sup>3</sup>, f<sup>a</sup> → <sup>1</sup>, f<sup>b</sup> → <sup>2</sup>}, <sup>|</sup>=*csa* (P*app*, Ω*app*) holds.

The call-sequence analysis can naturally be reduced to HFL model checking against the trivial LTS <sup>L</sup><sup>0</sup> = ({<sup>s</sup> }, <sup>∅</sup>, <sup>∅</sup>, <sup>s</sup> ) (or validity checking).

**Definition 8.** *Let* P = (D, t) *be a program and* Ω *be a priority assignment for* <sup>P</sup>*. The HES* <sup>Φ</sup>(P,Ω),*csa is* (D†*csa* , t†*csa* )*, where* <sup>D</sup>†*csa and* <sup>t</sup> †*csa are defined by:* {f1 x-1 = t1,...,fn x-†*csa* = f1 x-†*csa* ; ··· ; fn xn =αn tn†*csa*

<sup>n</sup> <sup>=</sup> <sup>t</sup><sup>n</sup>} <sup>1</sup> <sup>=</sup><sup>α</sup><sup>1</sup> <sup>t</sup><sup>1</sup> ( )†*csa* <sup>=</sup> **true** <sup>x</sup>†*csa* <sup>=</sup> x n†*csa* <sup>=</sup> <sup>n</sup> (t<sup>1</sup> op <sup>t</sup><sup>2</sup>) †*csa* <sup>=</sup> <sup>t</sup><sup>1</sup> †*csa* op <sup>t</sup><sup>2</sup> †*csa* (**if** p(t <sup>1</sup>,...,t <sup>k</sup>) **then** <sup>t</sup><sup>1</sup> **else** <sup>t</sup><sup>2</sup>) †*csa* = (p(t 1 †*csa* ,...,t k †*csa* ) <sup>⇒</sup> <sup>t</sup><sup>1</sup> †*csa* ) <sup>∧</sup> (¬p(t 1 †*csa* ,...,t k †*csa* ) <sup>⇒</sup> <sup>t</sup><sup>2</sup> †*csa* ) (**event** a;t) †*csa* <sup>=</sup> t †*csa* (t<sup>1</sup> <sup>t</sup><sup>2</sup>) †*csa* <sup>=</sup> <sup>t</sup><sup>1</sup> †*csa* <sup>t</sup><sup>2</sup> †*csa* (t<sup>1</sup>t2) †*csa* <sup>=</sup> <sup>t</sup><sup>1</sup> †*csa* <sup>∧</sup> <sup>t</sup><sup>2</sup> †*csa* . *Here, we assume that* <sup>Ω</sup>(fi) <sup>≥</sup> <sup>Ω</sup>(fi+1) *for each* <sup>i</sup> ∈ {1,...,n <sup>−</sup> <sup>1</sup>}*, and* <sup>α</sup><sup>i</sup> <sup>=</sup> <sup>ν</sup> *if* Ω(fi) *is even and* <sup>μ</sup> *otherwise.*

The following theorem states the soundness and completeness of the reduction. See [23] for a proof.

**Theorem 4.** *Let* P *be a program and* Ω *be a priority assignment for* P*. Then* <sup>|</sup>=*csa* (P, Ω) *if and only if* <sup>L</sup><sup>0</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>(P,Ω),*csa .*

*Example 9.* For <sup>P</sup>*app* and <sup>Ω</sup>*app* above, (P*app*, Ω*app*) †*csa* = (E, f<sup>b</sup> 5), where: <sup>E</sup> is:

app h x <sup>=</sup><sup>μ</sup> h x; <sup>f</sup><sup>b</sup> <sup>x</sup> <sup>=</sup><sup>ν</sup> (x > <sup>0</sup> <sup>⇒</sup> app <sup>f</sup><sup>a</sup> (<sup>x</sup> <sup>−</sup> 1)) <sup>∧</sup> (<sup>x</sup> <sup>≤</sup> <sup>0</sup> <sup>⇒</sup> app <sup>f</sup><sup>b</sup> 5); <sup>f</sup><sup>a</sup> <sup>x</sup> <sup>=</sup><sup>μ</sup> (x > <sup>0</sup> <sup>⇒</sup> app <sup>f</sup><sup>a</sup> (<sup>x</sup> <sup>−</sup> 1)) <sup>∧</sup> (<sup>x</sup> <sup>≤</sup> <sup>0</sup> <sup>⇒</sup> app <sup>f</sup><sup>b</sup> 5).

Note that <sup>L</sup><sup>0</sup> <sup>|</sup>= (P*app*, Ω*app*) †*csa* holds.

#### **7.2 From Temporal Verification to Call-Sequence Analysis**

This subsection shows a reduction from the temporal verification problem **InfTraces**(P) <sup>∩</sup> L ? = ∅ to a call-sequence analysis problem ? <sup>|</sup>=*csa* (P , Ω).

For the sake of simplicity, we assume without loss of generality that every program P = (D, t) in this section is non-terminating and every infinite reduction sequence produces infinite events, so that **FullTraces**(P) = **InfTraces**(P) holds. We also assume that the ω-regular language L for the temporal verification problem is specified by using a non-deterministic, parity word automaton [10]. We recall the definition of non-deterministic, parity word automata below.

**Definition 9 (Parity automaton).** *A* non-deterministic parity word automaton *is a quintuple* <sup>A</sup> = (Q, Σ, δ, q<sup>I</sup> , Ω) *where (i)* <sup>Q</sup> *is a finite set of states; (ii)* Σ *is a finite alphabet; (iii)* δ*, called a transition function, is a* total *map from* <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> *to* <sup>2</sup><sup>Q</sup>*; (iv)* <sup>q</sup><sup>I</sup> <sup>∈</sup> <sup>Q</sup> *is the initial state; and (v)* <sup>Ω</sup> <sup>∈</sup> <sup>Q</sup> <sup>→</sup> <sup>N</sup> *is the priority function. A* run *of* <sup>A</sup> *on an* <sup>ω</sup>*-word* <sup>a</sup><sup>0</sup>a<sup>1</sup> ···∈ <sup>Σ</sup><sup>ω</sup> *is an infinite sequence of states* <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup>(0)ρ(1)··· ∈ <sup>Q</sup><sup>ω</sup> *such that (i)* <sup>ρ</sup>(0) = <sup>q</sup><sup>I</sup> *, and (ii)* <sup>ρ</sup>(<sup>i</sup> + 1) <sup>∈</sup> <sup>δ</sup>(ρ(i), a<sup>i</sup>) *for each* i <sup>∈</sup> ω*. An* ω*-word* w <sup>∈</sup> Σ<sup>ω</sup> *is* accepted *by* <sup>A</sup> *if, there exists a run* <sup>ρ</sup> *of* <sup>A</sup> *on* w *such that* **max**{Ω(q) <sup>|</sup> q <sup>∈</sup> **Inf**(ρ)} *is even, where* **Inf**(ρ) *is the set of states that occur infinitely often in* ρ*. We write* <sup>L</sup>(A) *for the set of* ω*-words accepted by* A*.*

For technical convenience, we assume below that δ(q, a) <sup>=</sup> <sup>∅</sup> for every q <sup>∈</sup> Q and a <sup>∈</sup> Σ; this does not lose generality since if δ(q, a) = <sup>∅</sup>, we can introduce a new "dead" state <sup>q</sup>*dead* (with priority 1) and change <sup>δ</sup>(q, a) to {q*dead* }. Given a parity automaton <sup>A</sup>, we refer to each component of <sup>A</sup> by <sup>Q</sup><sup>A</sup>, <sup>Σ</sup><sup>A</sup>, <sup>δ</sup><sup>A</sup>, <sup>q</sup>I,<sup>A</sup> and <sup>Ω</sup><sup>A</sup>.

*Example 10.* Consider the automaton <sup>A</sup>ab = ({q<sup>a</sup>, q<sup>b</sup>}, {a, <sup>b</sup>}, δ, q<sup>a</sup>, Ω), where <sup>δ</sup> is as given in Fig. 4, Ω(q<sup>a</sup>) = 0, and <sup>Ω</sup>(q<sup>b</sup>) = 1. Then, <sup>L</sup>(Aab) = (a<sup>∗</sup>b)<sup>ω</sup> = (a<sup>∗</sup>b)<sup>∗</sup>a<sup>ω</sup>. The goal of this subsection is, given a program P and a parity word automaton <sup>A</sup>, to construct another program P and a priority assignment <sup>Ω</sup> for <sup>P</sup> , such that **InfTraces**(P) ∩ L(A) = <sup>∅</sup> if and only if <sup>|</sup>=*csa* (P , Ω).

Note that a necessary and sufficient condition for **InfTraces**(P) ∩ L(A) = <sup>∅</sup> is that no trace in **InfTraces**(P) has a run whose priority sequence satisfies the parity condition; in other words, for every sequence in **InfTraces**(P), and for every run for the sequence, the largest priority that occurs in the associated priority sequence is odd. As explained at the beginning of this section, we reduce this condition to a call sequence analysis problem by appropriately duplicating functions in a given program. For example, recall the program P<sup>0</sup>:

$$(\{f = \text{if } c \text{ then } (\text{event } \mathbf{a}; f) \text{ else } (\text{event } \mathbf{b}; f)\}, f) \dots$$

It is translated to P 0:

$$\begin{array}{l} ({\{f\_b = \text{if } c \text{ then } (\text{event } a; f\_a) \text{ else } (\text{event } b; f\_b)},\\ {f\_a = \text{if } c \text{ then } (\text{event } a; f\_a) \text{ else } (\text{event } b; f\_b)}, \end{array}$$

where c is some (closed) boolean expression. Since the largest priorities encountered before calling <sup>f</sup><sup>a</sup> and <sup>f</sup><sup>b</sup> (since the last recursive call) respectively are 0 and 1, we assign those priorities plus 1 (to flip odd/even-ness) to <sup>f</sup><sup>a</sup> and <sup>f</sup><sup>b</sup> respectively. Then, the problem of **InfTraces**(P<sup>0</sup>) ∩ L(A) = <sup>∅</sup> is reduced to <sup>|</sup>=*csa* (P <sup>0</sup>, {f<sup>a</sup> → <sup>1</sup>, f<sup>b</sup> → <sup>2</sup>}). Note here that the priorities of <sup>f</sup><sup>a</sup> and <sup>f</sup><sup>b</sup> represent *summaries* of the priorities (plus one) that occur in the run of the automaton until <sup>f</sup><sup>a</sup> and <sup>f</sup><sup>b</sup> are respectively called since the last recursive call; thus, the largest priority of states that occur infinitely often in the run for an infinite trace is equivalent to the largest priority that occurs infinitely often in the sequence of summaries (Ω(f<sup>1</sup>)−1)(Ω(f<sup>2</sup>)−1)(Ω(f<sup>3</sup>)−1)··· computed from a corresponding call sequence <sup>f</sup>1f2f<sup>3</sup> ···.

Due to the presence of higher-order functions, the general reduction is more complicated than the example above. First, we need to replicate not only function symbols, but also arguments. For example, consider the following variation <sup>P</sup><sup>1</sup> of <sup>P</sup><sup>0</sup> above:

$$(\{g \: k = \text{if } c \text{ then } (\text{event } \mathbf{a}; k) \text{ else } (\text{event } \mathbf{b}; k), \quad f = g \, f\}, \quad f \text{)}.$$

Here, we have just made the calls to f indirect, by preparing the function g. Obviously, the two calls to k in the body of g must be distinguished from each other, since different priorities are encountered before the calls. Thus, we duplicate the argument k, and obtain the following program P 1:

$$(\{g \: k\_a \: k\_b = \text{if } c \text{ then } (\text{event } a; k\_a) \text{ else } (\text{event } b; k\_b), f\_a = g \; f\_a \; f\_b, f\_b = g \; f\_a \; f\_b\}, f\_a).$$

Then, for the priority assignment <sup>Ω</sup> <sup>=</sup> {f<sup>a</sup> → <sup>1</sup>, f<sup>b</sup> → <sup>2</sup>, g → <sup>1</sup>}, **InfTraces**(P<sup>1</sup>)<sup>∩</sup> <sup>L</sup>(Aab) = <sup>∅</sup> if and only if <sup>|</sup>=*csa* (P <sup>1</sup>, Ω). Secondly, we need to take into account not only the priorities of states visited by A, but also the states themselves. For example, if we have a function definition f h <sup>=</sup> h(**event** a; f h), the largest priority encountered before f is recursively called in the body of f depends on the priorities encountered inside h, *and also* the state of <sup>A</sup> when h uses the argument **event** a; f (because the state after the a event depends on the previous state in general). We, therefore, use *intersection types* (a la Kobayashi and Ong's intersection types for HORS model checking [21]) to represent summary information on how each function traverses states of the automaton, and replicate each function and its arguments for each type. We thus formalize the translation as an intersection-type-based program transformation; related transformation techniques are found in [8,11,12,20,38]. 

**Definition 10.** *Let* <sup>A</sup> = (Q, Σ, δ, q<sup>I</sup> , Ω) *be a non-deterministic parity word automaton. Let* q *and* m *range over* Q *and the set codom*(Ω) *of priorities respectively. The set* **Types**<sup>A</sup> *of* intersection types*, ranged over by* <sup>θ</sup>*, is defined by:* 

$$\theta \colon= q \mid \rho \to \theta \qquad\qquad \rho \colon= \mathsf{int} \mid \bigwedge\_{1 \le i \le k} (\theta\_i, m\_i)$$

*We assume a certain total order* <sup>&</sup>lt; *on* **Types**<sup>A</sup> <sup>×</sup> <sup>N</sup>*, and require that in* <sup>1</sup>≤i≤<sup>k</sup>(θ<sup>i</sup>, m<sup>i</sup>)*,* (θ<sup>i</sup>, m<sup>i</sup>) <sup>&</sup>lt; (θ<sup>j</sup> , m<sup>j</sup> ) *holds for each* i<j*.* We often write (θ1, m<sup>1</sup>) ∧···∧ (θ<sup>k</sup>, m<sup>k</sup>) for

<sup>1</sup>≤i≤<sup>k</sup>(θ<sup>i</sup>, m<sup>i</sup>), and when k = 0. Intuitively, the type q describes expressions of simple type , which may be evaluated when the automaton <sup>A</sup> is in the state q (here, we have in mind an execution of the *product* of a program and the automaton, where the latter takes events produced by the program and changes its states). The type ( <sup>1</sup>≤i≤<sup>k</sup>(θ<sup>i</sup>, m<sup>i</sup>)) <sup>→</sup> θ describes functions that take an argument, use it according to types θ1,...,θ<sup>k</sup>, and return a value of type <sup>θ</sup>. Furthermore, the part <sup>m</sup><sup>i</sup> describes that the argument may be used as a value of type <sup>θ</sup><sup>i</sup> only when the largest priority visited since the function is called is m<sup>i</sup>. For example, given the automaton in Example 10, the function λx.(**event** <sup>a</sup>; <sup>x</sup>) may have types (q<sup>a</sup>, 0) <sup>→</sup> <sup>q</sup><sup>a</sup> and (q<sup>a</sup>, 0) <sup>→</sup> <sup>q</sup><sup>b</sup>, because the body may be executed from state <sup>q</sup><sup>a</sup> or <sup>q</sup><sup>b</sup> (thus, the return type may be any of them), but <sup>x</sup> is used only when the automaton is in state <sup>q</sup><sup>a</sup> and the largest priority visited is 1. In contrast, λx.(**event** <sup>b</sup>; <sup>x</sup>) have types (q<sup>b</sup>, 1) <sup>→</sup> <sup>q</sup><sup>a</sup> and (q<sup>b</sup>, 1) <sup>→</sup> q<sup>b</sup>.

Using the intersection types above, we shall define a type-based transformation relation of the form <sup>Γ</sup> <sup>A</sup> <sup>t</sup> : <sup>θ</sup> <sup>⇒</sup> <sup>t</sup> , where t and t are the source and target terms of the transformation, and Γ, called an *intersection type environment*, is a finite set of type bindings of the form x : int or x : (θ, m, m ). We allow multiple type bindings for a variable x except for x:int (i.e. if x:int <sup>∈</sup> Γ, then this must be the unique type binding for x in Γ). The binding x : (θ, m, m ) means that x should be used as a value of type θ when the largest priority visited is m; m is auxiliary information used to record the largest priority encountered so far.

The transformation relation <sup>Γ</sup> <sup>A</sup> <sup>t</sup> : <sup>θ</sup> <sup>⇒</sup> <sup>t</sup> is inductively defined by the rules in Fig. 5. (For technical convenience, we have extended terms with λ-abstractions; they may occur only at top-level function definitions.) In the figure, [k] denotes the set {i <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> k}. The operation Γ <sup>↑</sup> m used in the figure is defined by:

$$\Gamma \upharpoonright m = \{x : \mathbf{int} \mid x : \mathbf{int} \in \Gamma\} \cup \{x : (\theta, m\_1, \mathbf{max}(m\_2, m)) \mid x : (\theta, m\_1, m\_2) \in \Gamma\}$$

The operation is applied when the priority m is encountered, in which case the largest priority encountered is updated accordingly. The key rules are IT-Var, IT-Event, IT-App, and IT-Abs. In IT-Var, the variable x is replicated for each type; in the target of the translation, <sup>x</sup>θ,m and <sup>x</sup>θ-,m are treated as different variables if (θ,m) = (θ , m ). The rule IT-Event reflects the state change caused by the event a to the type and the type environment. Since the state change may be non-deterministic, we transform t for each of the next states q<sup>1</sup>,...,qn, and combine the resulting terms with non-deterministic choice. The rule IT-App and IT-Abs replicates function arguments for each type. In addition, in IT-App, the operation <sup>Γ</sup> <sup>↑</sup> <sup>m</sup><sup>i</sup> reflects the fact that <sup>t</sup><sup>2</sup> is used as a value of type <sup>θ</sup><sup>i</sup> after the priority <sup>m</sup><sup>i</sup> is encountered. The other rules just transform terms in a compositional manner. If target terms are ignored, the entire rules are close to those of Kobayashi and Ong's type system for HORS model checking [21].

**Fig. 5.** Type-based transformation rules for terms

We now define the transformation for programs. A *top-level type environment* Ξ is a finite set of type bindings of the form x : (θ,m). Like intersection type environments, Ξ may have more than one binding for each variable. We write <sup>Ξ</sup> <sup>A</sup> <sup>t</sup> : <sup>θ</sup> to mean {<sup>x</sup> : (θ, m, 0) <sup>|</sup> <sup>x</sup> : (θ,m) <sup>∈</sup> <sup>Ξ</sup>} <sup>A</sup> <sup>t</sup> : <sup>θ</sup>. For a set <sup>D</sup> of function definitions, we write <sup>Ξ</sup> <sup>A</sup> <sup>D</sup> <sup>⇒</sup> <sup>D</sup> if *dom*(D ) = { <sup>f</sup>θ,m <sup>|</sup> <sup>f</sup> : (θ,m) <sup>∈</sup> <sup>Ξ</sup> } and <sup>Ξ</sup> <sup>A</sup> <sup>D</sup>(f) : <sup>θ</sup> <sup>⇒</sup> <sup>D</sup> (fθ,m) for every f :(θ,m) <sup>∈</sup> Ξ. For a program P = (D, t), we write <sup>Ξ</sup> <sup>A</sup> <sup>P</sup> <sup>⇒</sup> (P , Ω ) if P = (D , t ), <sup>Ξ</sup> <sup>A</sup> <sup>D</sup> <sup>⇒</sup> <sup>D</sup> and <sup>Ξ</sup> <sup>A</sup> <sup>t</sup> : <sup>q</sup><sup>I</sup> <sup>⇒</sup> <sup>t</sup> , with Ω (fθ,m) = <sup>m</sup>+ 1 for each <sup>f</sup>θ,m <sup>∈</sup> *dom*(D ). We just write <sup>A</sup> <sup>P</sup> <sup>⇒</sup> (P , Ω ) if <sup>Ξ</sup> <sup>A</sup> <sup>P</sup> <sup>⇒</sup> (P , Ω ) holds for some Ξ.

*Example 11.* Consider the automaton <sup>A</sup>ab in Example 10, and the program <sup>P</sup><sup>2</sup> <sup>=</sup> (D<sup>2</sup>, f 5) where <sup>D</sup><sup>2</sup> consists of the following function definitions:

> g k = (**event** a; k)-(**event** b; k), f x <sup>=</sup> **if** x > <sup>0</sup> **then** g (f(x <sup>−</sup> 1)) **else** (**event** b; f 5).

Let Ξ be: {g : ((q<sup>a</sup>, 0) <sup>∧</sup> (q<sup>b</sup>, 1) <sup>→</sup> <sup>q</sup><sup>a</sup>, 0), g : ((q<sup>a</sup>, 0) <sup>∧</sup> (q<sup>b</sup>, 1) <sup>→</sup> <sup>q</sup><sup>b</sup>, 0), f : (int <sup>→</sup> <sup>q</sup><sup>a</sup>, 0), f : (int <sup>→</sup> <sup>q</sup><sup>b</sup>, 1)}. Then, <sup>Ξ</sup> <sup>A</sup> <sup>P</sup><sup>1</sup> <sup>⇒</sup> ((D <sup>2</sup>, fint→qa,<sup>0</sup> 5), Ω) where:

$$\begin{array}{llll} D\_{2}^{\prime} = \{ g\_{(q\_{a},0) \wedge (q\_{b},1) \rightarrow q\_{a},0} \, k\_{q\_{a},0} \, k\_{q\_{b},1} = t\_{g}, & g\_{(q\_{a},0) \wedge (q\_{b},1) \rightarrow q\_{b},0} \, k\_{q\_{a},0} \, k\_{q\_{b},1} = t\_{g}, \\\ f\_{\texttt{int-eq},q\_{a}} \, 0 \, x\_{\texttt{int}} = t\_{f,q\_{a}}, & f\_{\texttt{int-eq},q\_{1}} \, x\_{\texttt{int}} = t\_{f,q\_{b}} \} \\\ t\_{g} = (\textbf{event} \, \texttt{a}; k\_{q\_{a},0}) \square(\textbf{event} \, \texttt{b}; k\_{q\_{b},1}), \\\ t\_{f,q} = \textbf{if} \, x\_{\texttt{int}} > 0 \, \textbf{then} \\\ f\_{(q\_{a},0) \wedge (q\_{b},1) \rightarrow q,0} \, (f\_{\texttt{int-eq},q\_{a}} \, (x\_{\texttt{int}} - 1)) \, (f\_{\texttt{int-eq},q\_{b}} \, 1 \, (x\_{\texttt{int}} - 1)) \\\ \textbf{else} \, (\textbf{event} \, \texttt{b}; f\_{\texttt{int-eq},b\_{1}} \, 5), & \text{(for each } q \in \{q\_{a}, q\_{b}\}) \\\ \Omega = \{g\_{(q\_{a},0) \wedge (q\_{b},1) \rightarrow q\_{a},0} \, \forall 1, g\_{(q\_{a},0) \wedge (q\_{b},1) \rightarrow q\_{b},0} \, \forall 1, f\_{\texttt{int-eq} \, \texttt{b}\_{b},1 \rightarrow q\_{b},1} \, \mathtt{2} \}. \end{array}$$

Notice that f, g, and the arguments of g have been duplicated. Furthermore, whenever <sup>f</sup>θ,m is called, the largest priority that has been encountered since the last recursive call is <sup>m</sup>. For example, in the then-clause of <sup>f</sup>int→qa,<sup>0</sup>, <sup>f</sup>int→qb,<sup>1</sup>(x−1) may be called through <sup>g</sup>(qa,0)∧(qb,1)→qa,<sup>0</sup>. Since <sup>g</sup>(qa,0)∧(qb,1)→qa,<sup>0</sup> uses the second argument only after an event b, the largest priority encountered is 1. This property is important for the correctness of our reduction.

The following theorems below claim that our reduction is sound and complete, and that there is an effective algorithm for the reduction: see [23] for proofs.

**Theorem 5.** *Let* P *be a program and* <sup>A</sup> *be a parity automaton. Suppose that* <sup>Ξ</sup> <sup>A</sup> <sup>P</sup> <sup>⇒</sup> (P , Ω)*. Then* **InfTraces**(P)∩ L(A) = <sup>∅</sup> *if and only if* <sup>|</sup>=*csa* (P , Ω)*.*

**Theorem 6.** *For every* <sup>P</sup> *and* <sup>A</sup>*, one can effectively construct* <sup>Ξ</sup>*,* <sup>P</sup> *and* Ω *such that* <sup>Ξ</sup> <sup>A</sup> <sup>P</sup> <sup>⇒</sup> (P , Ω)*.*

The proof of Theorem 6 above also implies that the reduction from temporal property verification to call-sequence analysis can be performed in polynomial time. Combined with the reduction from call-sequence analysis to HFL model checking, we have thus obtained a polynomial-time reduction from the temporal verification problem **InfTraces**(P) ? ⊆ L(A) to HFL model checking.

### **8 Related Work**

As mentioned in Sect. 1, our reduction from program verification problems to HFL model checking problems has been partially inspired by the translation of Kobayashi et al. [19] from HORS model checking to HFL model checking. As in their translation (and unlike in previous applications of HFL model checking [28, 42]), our translation switches the roles of properties and models (or programs) to be verified. Although a combination of their translation with Kobayashi's reduction from program verification to HORS model checking [17,18] yields an (indirect) translation from *finite-data* programs to pure HFL model checking problems, the combination does not work for infinite-data programs. In contrast, our translation is sound and complete even for infinite-data programs. Among the translations in Sects. 5, 6 and 7, the translation in Sect. 7.2 shares some similarity to their translation, in that functions and their arguments are replicated for each priority. The actual translations are however quite different; ours is typedirected and optimized for a given automaton, whereas their translation is not. This difference comes from the difference of the goals: the goal of [19] was to clarify the relationship between HORS and HFL, hence their translation was designed to be independent of an automaton. The proof of the correctness of our translation in Sect. 7 is much more involved due to the need for dealing with integers. Whilst the proof of [19] could reuse the type-based characterization of HORS model checking [21], we had to generalize arguments in both [19,21] to work on infinite-data programs.

Lange et al. [28] have shown that various process equivalence checking problems (such as bisimulation and trace equivalence) can be reduced to (pure) HFL model checking problems. The idea of their reduction is quite different from ours. They reduce processes to LTSs, whereas we reduce programs to HFL formulas.

Major approaches to automated or semi-automated higher-order program verification have been HORS model checking [17,18,22,27,31,33,43], (refinement) type systems [14,24,34–36,39,41,44], Horn clause solving [2,7], and their combinations. As already discussed in Sect. 1, compared with the HORS model checking approach, our new approach provides more uniform, streamlined methods. Whilst the HORS model checking approach is for fully automated verification, our approach enables various degrees of automation: after verification problems are automatically translated to HFL**<sup>Z</sup>** formulas, one can prove them (i) interactively using a proof assistant like Coq (see [23]), (ii) semi-automatically, by letting users provide hints for induction/co-induction and discharging the rest of proof obligations by (some extension of) an SMT solver, or (iii) fully automatically by recasting the techniques used in the HORS-based approach; for example, to deal with the ν-only fragment of HFL**Z**, we can reuse the technique of predicate abstraction [22]. For a more technical comparison between the HORS-based approach and our HFL-based approach, see [23].

As for type-based approaches [14,24,34–36,39,41,44], most of the refinement type systems are (i) restricted to safety properties, and/or (ii) incomplete. A notable exception is the recent work of Unno et al. [40], which provides a relatively complete type system for the classes of properties discussed in Sect. 5. Our approach deals with a wider class of properties (cf. Sects. 6 and 7). Their "relative completeness" property relies on Godel coding of functions, which cannot be exploited in practice.

The reductions from program verification to Horn clause solving have recently been advocated [2–4] or used [34,39] (via refinement type inference problems) by a number of researchers. Since Horn clauses can be expressed in a fragment of HFL without modal operators, fixpoint alternations (between ν and μ), and higher-order predicates, our reductions to HFL model checking may be viewed as extensions of those approaches. Higher-order predicates and fixpoints over them allowed us to provide sound and complete characterizations of properties of higher-order programs for a wider class of properties. Bjørner et al. [4] proposed an alternative approach to obtaining a complete characterization of safety properties, which defunctionalizes higher-order programs by using algebraic data types and then reduces the problems to (first-order) Horn clauses. A disadvantage of that approach is that control flow information of higher-order programs is also encoded into algebraic data types; hence even for finite-data higher-order programs, the Horn clauses obtained by the reduction belong to an undecidable fragment. In contrast, our reductions yield pure HFL model checking problems for finite-data programs. Burn et al. [7] have recently advocated the use of *higherorder* (constrained) Horn clauses for verification of safety properties (i.e., which correspond to the negation of may-reachability properties discussed in Sect. 5.1 of the present paper) of higher-order programs. They interpret recursion using the least fixpoint semantics, so their higher-order Horn clauses roughly corresponds to a fragment of the HFL**<sup>Z</sup>** without modal operators and fixpoint alternations. They have not shown a general, concrete reduction from safety property verification to higher-order Horn clause solving.

The characterization of the reachability problems in Sect. 5 in terms of formulas without modal operators is a reminiscent of predicate transformers [9,13] used for computing the weakest preconditions of imperative programs. In particular, [5] and [13] respectively used least fixpoints to express weakest preconditions for while-loops and recursions.

### **9 Conclusion**

We have shown that various verification problems for higher-order functional programs can be naturally reduced to (extended) HFL model checking problems. In all the reductions, a program is mapped to an HFL formula expressing the property that the behavior of the program is correct. For developing verification tools for higher-order functional programs, our reductions allow us to focus on the development of (automated or semi-automated) HFL**<sup>Z</sup>** model checking tools (or, even more simply, theorem provers for HFL**<sup>Z</sup>** without modal operators, as the reductions of Sects. 5 and 7 yield HFL formulas without modal operators). To this end, we have developed a prototype model checker for pure HFL (without integers), which will be reported in a separate paper. Work is under way to develop HFL**<sup>Z</sup>** model checkers by recasting the techniques [22,26,27,43] developed for the HORS-based approach, which, together with the reductions presented in this paper, would yield fully automated verification tools. We have also started building a Coq library for interactively proving HFL**<sup>Z</sup>** formulas, as briefly discussed in [23]. As a final remark, although one may fear that our reductions may map program verification problems to "harder" problems due to the expressive power of HFL**Z**, it is actually not the case at least for the classes of problems in Sects. 5 and 6, which use the only alternation-free fragment of HFL**Z**. The model checking problems for <sup>μ</sup>-only or <sup>ν</sup>-only HFL**<sup>Z</sup>** are semi-decidable and co-semi-decidable respectively, like the source verification problems of may/must-reachability and their negations of closed programs.

**Acknowledgment.** We would like to thank anonymous referees for useful comments. This work was supported by JSPS KAKENHI Grant Number JP15H05706 and JP16K16004.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Quantitative Analysis of Smart Contracts**

Krishnendu Chatterjee<sup>1</sup>, Amir Kafshdar Goharshady1(B), and Yaron Velner<sup>2</sup>

<sup>1</sup> IST Austria (Institute of Science and Technology Austria), Klosterneuburg, Austria

*{*krishnendu.chatterjee,amir.goharshady*}*@ist.ac.at <sup>2</sup> Hebrew University of Jerusalem, Jerusalem, Israel

yaron.welner@mail.huji.ac.il

**Abstract.** Smart contracts are computer programs that are executed by a network of mutually distrusting agents, without the need of an external trusted authority. Smart contracts handle and transfer assets of considerable value (in the form of crypto-currency like Bitcoin). Hence, it is crucial that their implementation is bug-free. We identify the utility (or expected payoff) of interacting with such smart contracts as the basic and canonical quantitative property for such contracts. We present a framework for such quantitative analysis of smart contracts. Such a formal framework poses new and novel research challenges in programming languages, as it requires modeling of game-theoretic aspects to analyze incentives for deviation from honest behavior and modeling utilities which are not specified as standard temporal properties such as safety and termination. While game-theoretic incentives have been analyzed in the security community, their analysis has been restricted to the very special case of stateless games. However, to analyze smart contracts, stateful analysis is required as it must account for the different program states of the protocol. Our main contributions are as follows: we present (i) a simplified programming language for smart contracts; (ii) an automatic translation of the programs to state-based games; (iii) an abstractionrefinement approach to solve such games; and (iv) experimental results on real-world-inspired smart contracts.

### **1 Introduction**

In this work we present a quantitative stateful game-theoretic framework for formal analysis of smart-contracts.

*Smart Contracts.* Hundreds of crypto-currencies are in use today, and investments in them are increasing steadily [24]. These currencies are not controlled by any central authority like governments or banks, instead they are governed by the *blockchain* protocol, which dictates the rules and determines the outcomes, e.g., the validity of money transactions and account balances. Blockchain was initially used for peer-to-peer Bitcoin payments [43], but recently it is also used for running programs (called smart contracts). A *smart contract* is a program that runs on the blockchain, which enforces its correct execution (i.e., that

A longer version of this article is available in [19].

c The Author(s) 2018

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 739–767, 2018. https://doi.org/10.1007/978-3-319-89884-1\_26

it is running as originally programmed). This is done by encoding semantics in crypto-currency transactions. For example, Bitcoin transaction scripts allow users to specify conditions, or contracts, which the transactions must satisfy prior to acceptance. Transaction scripts can encode many useful functions, such as validating that a payer owns a coin she is spending or enforcing rules for multi-party transactions. The Ethereum crypto-currency [16] allows arbitrary stateful Turing-complete conditions over the transactions which gives rise to smart contracts that can implement a wide range of applications, such as financial instruments (e.g., financial derivatives or wills) or autonomous governance applications (e.g., voting systems). The protocols are globally specified and their implementation is decentralized. Therefore, there is no central authority and they are immutable. Hence, the economic consequences of bugs in a smart contract cannot be reverted.

*Types of Bugs.* There are two types of bugs with monetary consequences:


*DAO Attack: Interaction of Two Types of Bugs.* Quite interestingly a coding bug can incentivize dishonest behavior as in the famous DAO attack [48]. The Decentralized Autonomous Organization (DAO) [38] is an Ethereum smart contract [51]. The contract consists of investor-directed venture capital fund. On June 17, 2016 an attacker exploited a bug in the contract to extract \$80 million [48]. Intuitively, the root cause was that the contract allowed users to first get hold of their funds, and only then updated their balance records while a semantic detail allowed the attacker to withdraw multiple times before the update.

*Necessity of Formal Framework.* Since bugs in smart contracts have direct economic consequences and are irreversible, they have the same status as safetycritical errors for programs and reactive systems and must be detected before deployment. Moreover, smart contracts are deployed rapidly. There are over a million smart contracts in Ethereum, holding over 15 billion dollars at the time of writing [31]. It is impossible for security researchers to analyze all of them, and lack of automated tools for programmers makes them error prone. Hence, a formal analysis framework for smart contract bugs is of great importance.

*Utility Analysis.* In verification of programs, specifying objectives is non-trivial and a key goal is to consider specification-less verification, where basic properties are considered canonical. For example, termination is a basic property in program analysis; and data-race freedom or serializability are basic properties in concurrency. Given these properties, models are verified wrt them without considering any other specification. For smart contracts, describing the correct specification that prevents dishonest behavior is more challenging due to the presence of game-like interactions. We propose to consider the expected user utility (or payoff) that is guaranteed even in presence of adversarial behavior of other agents as a canonical property. Considering malicious adversaries is standard in game theory. For example, the expected utility of a fair lottery is 0. An analysis reporting a different utility signifies a bug.

*New Research Challenges.* Coding bugs are detected by classic verification, program analysis, and model checking tools [23,39]. However, a formal framework for incentivization bugs presents a new research challenge for the programming language community. Their analysis must overcome two obstacles: (a) the framework will have to handle game-theoretic aspects to model interactions and incentives for dishonest behavior; and (b) it will have to handle properties that cannot be deduced from standard temporal properties such as safety or termination, but require analysis of monetary gains (i.e., quantitative properties).

While game-theoretic incentives are widely analyzed by the security community (e.g., see [13]), their analysis is typically restricted to the very special case of one-shot games that do not consider different states of the program, and thus the consequences of decisions on the next state of the program are ignored. In addition their analysis is typically ad-hoc and stems from brainstorming and special techniques. This could work when very few protocols existed (e.g., when bitcoin first emerged) and deep thought was put into making them elegant and analyzable. However, the fast deployment of smart contracts makes it crucial to automate the process and make it accessible to programmers.

*Our Contribution.* In this work we present a formal framework for quantitative analysis of utilities in smart contracts. Our contributions are as follows:


(namely, turn-based games) have been studied in verification and reactive synthesis, there are no practical methods to solve general concurrent quantitative games. To the best of our knowledge, there are no tools to solve quantitative concurrent games other than academic examples of few states, and we present the first practical method to solve quantitative concurrent games that scales to real-world smart contract analysis.

In summary, our contributions range from (i) modeling of smart contracts as state-based games, to (ii) an abstraction-refinement approach to solve such games, to (iii) experimental results on real-world smart contracts.

### **2 Background on Ethereum Smart Contracts**

### **2.1 Programmable Smart Contracts**

Ethereum [16] is a decentralized virtual machine, which runs programs called contracts. Contracts are written in a Turing-complete bytecode language, called Ethereum Virtual Machine (EVM) bytecode [53]. A contract is invoked by calling one of its functions, where each function is defined by a sequence of instructions. The contract maintains a persistent internal state and can receive (transfer) currency from (to) users and other contracts. Users send transactions to the Ethereum network to invoke functions. Each transaction may contain input parameters for the contract and an associated monetary amount, possibly 0, which is transferred from the user to the contract.

Upon receiving a transaction, the contract collects the money sent to it, executes a function according to input parameters, and updates its internal state. All transactions are recorded on a decentralized ledger, called blockchain. A sequence of transactions that begins from the creation of the network uniquely determines the state of each contract and balances of users and contracts. The blockchain does not rely on a trusted central authority, rather, each transaction is processed by a large network of mutually untrusted peers called miners. Users constantly broadcast transactions to the network. Miners add transactions to the blockchain via a proof-of-work consensus protocol [43].

*Subtleties.* In this work, for simplicity, we ignore some details in the underlying protocol of Ethereum smart contract. We briefly describe these details below:

– *Transaction fees.* In exchange for including her transactions in the blockchain, a user pays transaction fees to the miners, proportionally to the execution time of her transaction. This fact could slightly affect the monetary analysis of the user gain, but could also introduce bugs in a program, as there is a bound on execution time that cannot be exceeded. Hence, it is possible that some functions could never be called, or even worse, a user could actively give input parameters that would prevent other users from invoking a certain function.


### **2.2 Tokens and User Utility**

A user's utility is determined by the Ether she spends and receives, but could also be affected by the state of the contract. Most notably, smart contracts are used to issue *tokens*, which can be viewed as a stake in a company or an organization, in return to an Ether (or tokens) investment (see an example in Fig. 1). These tokens are *transferable* among users and are traded in exchanges in return to Ether, Bitcoin and Fiat money. At the time of writing, smart contracts instantiate tokens worth billions of dollars [32]. Hence, gaining or losing tokens has clear utility for the user. At a larger scope, user utility could also be affected by more abstract storage changes. Some users would be willing to pay to have a contract declare them as Kings of Ether [4], while others could gain from registering their domain name in a smart contract storage [40]. In the examples provided in this work we mainly focus on utility that arises from Ether, tokens and the like. However, our approach is general and can model any form of utility by introducing auxiliary utility variables and definitions.


**Fig. 1.** Token contract example.

### **3 Programming Language for Smart Contracts**

In this section we present our programming language for smart contracts that supports concurrent interactions between parties. A party denotes an agent that decides to interact with the contract. A contract is a tuple C = (N, I,M, R, X0, F, T) where <sup>X</sup> := <sup>N</sup> <sup>∪</sup> <sup>I</sup> <sup>∪</sup> <sup>M</sup> is a set of variables, <sup>R</sup> describes the range of values that can be stored in each variable, X<sup>0</sup> is the initial values stored in variables, F is a list of functions and T describes for each function, the time segment in which it can be invoked. We now formalize these concepts.

*Variables.* There are three distinct and disjoint types of variables in X:


*Bounds and Initial Values.* The tuple <sup>R</sup> = (R, <sup>R</sup>) where R, <sup>R</sup> : <sup>N</sup> <sup>∪</sup> <sup>M</sup> <sup>→</sup> <sup>Z</sup> represent lower and upper bounds for integer values that can be stored in a variable. For example, if <sup>n</sup> <sup>∈</sup> <sup>N</sup>, then <sup>n</sup> can only store integers between <sup>R</sup>(n) and <sup>R</sup>(n). Similarly, if <sup>m</sup> <sup>∈</sup> <sup>M</sup> is a mapping and <sup>i</sup> <sup>∈</sup> <sup>I</sup> stores an address to a party in the contract, then m [i] can save integers between R(m) and R(m). The function <sup>X</sup><sup>0</sup> : <sup>X</sup> <sup>→</sup> <sup>Z</sup> ∪ {Null} assigns an initial value to every variable. The assigned value is an integer in case of numeric and mapping variables, i.e., a mapping variable maps everything to its initial value by default. Id variables can either be initialized by Null or an id used by one of the parties.

*Functions and Timing.* The sequence F =< f1, f2,...,f<sup>n</sup> > is a list of functions and <sup>T</sup> = (T, <sup>T</sup>), where T, <sup>T</sup> : <sup>F</sup> <sup>→</sup> <sup>N</sup>. The function <sup>f</sup><sup>i</sup> can only be invoked in time-frame T(fi) = - T(fi), T(fi) . The contract uses a global clock, for example the current block number in the blockchain, to keep track of time.

Note that we consider a single contract, and interaction between multiple contracts is a subject of future work.

### **3.1 Syntax**

We provide a simple overview of our contract programming language. Our language is syntactically similar to Solidity [30], which is a widely used language for writing Ethereum contracts. A translation mechanism for different aspects is discussed in [19]. An example contract, modeling a game of rock-paper-scissors, is given in Fig. 2. Here, a party, called issuer has issued the contract and taken the role of Alice. Any other party can join the contract by registering as Bob and then playing rock-paper-scissors. To demonstrate our language, we use a bidding mechanism.

*Declaration of Variables.* The program begins by declaring variables<sup>1</sup>, their type, name, range and initial value. For example, Bids is a map variable that assigns a value between 0 and 100 to every id. This value is initially 0. Line numbers (labels) are defined in Sect. 3.2 below and are not part of the syntax.

*Declaration of Functions.* After the variables, the functions are defined one-byone. Each function begins with the keyword function followed by its name and

<sup>1</sup> For simplicity, we demonstrate our method with global variables only. However, the method is applicable to general variables as long as their ranges are well-defined at each point of the program.

**Fig. 2.** A rock-paper-scissors contract.

the time interval in which it can be called by parties. Then comes a list of input parameters. Each parameter is of the form variable : party which means that the designated party can choose a value for that variable. The chosen value is required to be in the range specified for that variable. The keyword caller denotes the party that has invoked this function and payable signifies that the party should not only decide a value, but must also pay the amount she decides. For example, registerBob can be called in any time between 1 and 10 by any of the parties. At each such invocation the party that has called this function must pay some amount which will be saved in the variable bid. After the decisions and payments are done, the contract proceeds with executing the function.

*Types of Functions.* There are essentially two types of functions, depending on their parameters. *One-party functions*, such as registerBob and getReward require parameters from caller only, while *multi-party functions*, such as play ask several, potentially different, parties for input. In this case all parties provide their input decisions and payments concurrently and without being aware of the choices made by other parties, also a default value is specified for every decision in case a relevant party does not take part.

*Summary.* Putting everything together, in the contract specified in Fig. 2, any party can claim the role of Bob between time 1 and time 10 by paying a bid to the contract, if the role is not already occupied. Then at time 11 one of the parties calls play and both parties have until time 15 to decide which choice (rock, paper, scissors or none) they want to make. Then the winner can call getReward and collect her prize.

### **3.2 Semantics**

In this section we present the details of the semantics. In our programming language there are several key aspects which are non-standard in programming languages, such as the notion of time progress, concurrency, and interactions of several parties. Hence we present a detailed description of the semantics. We start with the requirements.

*Requirements.* In order for a contract to be considered valid, other than following the syntax rules, a few more requirements must be met, which are as follows:


*Overview of Time Progress.* Initially, the time is 0. Let F<sup>t</sup> be the set of functions executable at time <sup>t</sup>, i.e., <sup>F</sup><sup>t</sup> <sup>=</sup> {f<sup>i</sup> <sup>∈</sup> <sup>F</sup>|<sup>t</sup> <sup>∈</sup> <sup>T</sup>(fi)}, then <sup>F</sup><sup>t</sup> is either empty or contains one or more one-party functions or consists of a single multi-party function. We consider the following cases:


The clock ticks when there are no more valid requests for setting a value for a variable or making a payment. This continues until we reach time T(fi). At this time parties can no longer change their choices and the choices become visible to everyone. The contract proceeds with execution of the function. If a party fails to make a payment/decision or if Null is asked to make a payment or a decision, default behavior will be enforced. Default value for payments is 0 and default behavior for other variables is defined as part of the syntax. For example, in function play of Fig. 2, if a party does not choose, a default value of 0 is enforced and given the rest of this function, this will lead to a definite loss.

Given the notion of time progress we proceed to formalize the notion of "runs" of the contract. This requires the notion of labels, control-flow graphs, valuations, and states, which we describe below.

*Labels.* Starting from 0, we give the contract, beginning and end points of every function, and every command a label. The labels are given in order of appearance. As an example, see the labels in parentheses in Fig. 2.

*Entry and Exit Labels.* We denote the first (beginning point) label in a function f<sup>i</sup> by <sup>i</sup> and its last (end point) label by i.

*Control Flow Graphs (CFGs).* We define the control flow graph CFG<sup>i</sup> of the function f<sup>i</sup> in the standard manner, i.e. CFG<sup>i</sup> = (V,E), where there is a vertex corresponding to every labeled entity inside <sup>f</sup>i. Each edge <sup>e</sup> <sup>∈</sup> <sup>E</sup> has a condition *cond*(e) which is a boolean expression that must be true when traversing that edge. For more details see [19].

*Valuations.* A valuation is a function *val*, assigning a value to every variable. Values for numeric variables must be integers in their range, values for identity variables can be party ids or Null and a value assigned to a map variable <sup>m</sup> must be a function *val*(m) such that for each identity <sup>i</sup>, we have <sup>R</sup>(m) <sup>≤</sup> *val*(m)(i) <sup>≤</sup> R(m). Given a valuation, we extend it to expressions containing mathematical operations in the straight-forward manner.

*States.* A state of the contract is a tuple s = (t, b, l, *val*, c), where t is a time stamp, <sup>b</sup> <sup>∈</sup> <sup>N</sup> ∪ {0} is the current balance of the contract, i.e., the total amount of payment to the contract minus the total amount of payouts, l is a label (that is being executed), *val* assigns values to variables and <sup>c</sup> <sup>∈</sup> <sup>P</sup> ∪{⊥}, is the caller of the current function. <sup>c</sup> <sup>=</sup><sup>⊥</sup> corresponds to the case where the caller is undefined, e.g., when no function is being executed. We use S to denote the set of all states that can appear in a run of the contract as defined below.

*Runs.* A run <sup>ρ</sup> of the contract is a finite sequence {ρ<sup>j</sup> = (t<sup>j</sup> , b<sup>j</sup> , l<sup>j</sup> , *val*<sup>j</sup> , c<sup>j</sup> )} r j=0 of states, starting from (0, <sup>0</sup>, <sup>0</sup>, X0, <sup>⊥</sup>), that follows all rules of the contract and ends in a state with time-stamp t<sup>r</sup> > max<sup>f</sup>*<sup>i</sup>* T(fi). These rules must be followed when switching to a new state in a run:

– The clock can only tick when there are no valid pending requests for running a one-party function or deciding or paying in multi-party functions.


*Remark 1.* Note that in our semantics each function body completes its execution in a single tick of the clock. However, ticks might contain more than one function call and execution.

*Run Prefixes.* We use H to mean the set of all prefixes of runs and denote the last state in <sup>η</sup> <sup>∈</sup> <sup>H</sup> by *end*(η). A run prefix <sup>η</sup> is an extension of <sup>η</sup> if it can be obtained by adding one state to the end of η.

*Probability Distributions.* Given a finite set X , a probability distribution on X is a function <sup>δ</sup> : X → [0, 1] such that <sup>x</sup>∈X <sup>δ</sup>(x) = 1. Given such a distribution, its support, Supp(δ), is the set of all <sup>x</sup> ∈ X such that <sup>δ</sup>(x) <sup>&</sup>gt; 0. We denote the set of all probability distributions on <sup>X</sup> by <sup>Δ</sup>(<sup>X</sup> ).

Typically for programs it suffices to define runs for the semantics. However, given that there are several parties in contracts, their semantics depends on the possible choices of the parties. Hence we need to define policies for parties, and such policies will define probability distribution over runs, which constitute the semantics for contracts. To define policies we first define moves.

*Moves.* We use M for the set of all moves. The moves that can be taken by parties in a contract can be summarized as follows:


*Permitted Moves.* We define <sup>P</sup><sup>i</sup> : <sup>S</sup> → M, so that <sup>P</sup>i(s) is the set of permitted moves for the party with identity i if the contract is in state s = (t, b, l, *val*, p<sup>j</sup> ). It is formally defined as follows:


*Policies and Randomized Policies.* A policy <sup>π</sup><sup>i</sup> for party <sup>i</sup> is a function <sup>π</sup><sup>i</sup> : <sup>H</sup> <sup>→</sup> <sup>A</sup>, such that for every <sup>η</sup> <sup>∈</sup> <sup>H</sup>, <sup>π</sup>i(η) <sup>∈</sup> <sup>P</sup>i(*end*(η)). Intuitively, a policy is a way of deciding what move to use next, given the current run prefix. A policy profile π = (πi) is a sequence assigning one policy to each party i. The policy profile π defines a unique run ρ<sup>π</sup> of the contract which is obtained when parties choose their moves according to π. A randomized policy ξ<sup>i</sup> for party i is a function <sup>ξ</sup><sup>i</sup> : <sup>H</sup> <sup>→</sup> <sup>Δ</sup>(M), such that Supp(ξi(s)) <sup>⊆</sup> <sup>P</sup>i(s). A randomized policy assigns a probability distribution over all possible moves for party i given the current run prefix of the contract, then the party can follow it by choosing a move randomly according to the distribution. We use Ξ to denote the set of all randomized policy profiles, <sup>Ξ</sup><sup>i</sup> for randomized policies of <sup>i</sup> and <sup>Ξ</sup>−<sup>i</sup> to denote the set of randomized policy profiles for all parties except i. A randomized policy profile ξ is a sequence (ξi) assigning one randomized policy to each party. Each such randomized policy profile induces a unique probability measure on the set of runs, which is denoted as Prob<sup>ξ</sup> [·]. We denote the expectation measure associated to Prob<sup>ξ</sup> [·] by <sup>E</sup><sup>ξ</sup> [·].

### **3.3 Objective Function and Values of Contracts**

As mentioned in the introduction we identify expected payoff as the canonical property for contracts. The previous section defines expectation measure given randomized policies as the basic semantics. Given the expected payoff, we define values of contracts as the worst-case guaranteed payoff for a given party. We formalize the notion of objective function (the payoff function).

*Objective Function.* An objective o for a party p is in one of the following forms:


Informally, p is trying to choose her moves so as to maximize o.

*Run Outcomes.* Given a run ρ of the program and an objective o for party p, the outcome κ(ρ, o, p) is the value of o computed using the valuation at *end*(ρ) for all variables and accounting for payments in ρ to compute p<sup>+</sup> and p−.

*Contract Values.* Since we consider worst-case guaranteed payoff, we consider that there is an objective o for a single party p which she tries to maximize and all other parties are adversaries who aim to minimize o. Formally, given a contract C and an objective o for party p, we define the value of contract as:

$$\mathsf{V}(C, o, p) := \sup\_{\xi\_p \in \Xi\_p} \inf\_{\xi\_-, p \in \Xi\_{-p}} \mathbb{E}^{(\xi\_p, \xi\_{-p})} \left[ \kappa(\rho, o, p) \right],$$

<sup>2</sup> We are also assuming, as in many programming languages, that True = 1 and False = 0.

This corresponds to p trying to maximize the expected value of o and all other parties maliciously colluding to minimize it. In other words, it provides the worstcase guarantee for party p, irrespective of the behavior of the other parties, which in the worst-case is adversarial to party p.

### **3.4 Examples**

One contribution of our work is to present the simplified programming language, and to show that this simple language can express several classical smart contracts. To demonstrate the applicability, we present several examples of classical smart contracts in this section. In each example, we present a contract and a "buggy" implementation of the same contract that has a different value. In Sect. 6 we show that our automated approach to analyze the contracts can compute contract values with enough precision to differentiate between the correct and the buggy implementation. All of our examples are motivated from well-known bugs that have happened in real life in Ethereum.

**Rock-Paper-Scissors.** Let our contract be the one specified in Fig. 2 and assume that we want to analyze it from the point of view of the issuer p. Also, let the objective function be (p<sup>+</sup> <sup>−</sup> <sup>p</sup><sup>−</sup> + 10 · AliceWon). Intuitively, this means that winning the rock-paper-scissors game is considered to have an additional value of 10, other than the spending and earnings. The idea behind this is similar to the case with chess tournaments, in which players not only win a prize, but can also use their wins to achieve better "ratings", so winning has extra utility.

A common bug in writing rock-paper-scissors is allowing the parties to move sequentially, rather than concurrently [29]. If parties can move sequentially and the issuer moves after Bob, then she can ensure a utility of 10, i.e. her worst-case expected reward is 10. However, in the correct implementation as in Fig. 2, the best strategy for both players is to bid 0 and then Alice can win the game with probability 1/3 by choosing each of the three options with equal probability. Hence, her worst-case expected reward is 10/3.

**Auction.** Consider an open auction, in which during a fixed time interval everyone is allowed to bid for the good being sold and everyone can see others' bids. When the bidding period ends a winner emerges and every other participant can get their money back. Let the variable HighestBid store the value of the highest bid made at the auction. Then for a party p, one can define the objective as:

$$(p^{+} - p^{-} + (\mathsf{Winner=} p) \times \mathsf{HigbestBid}.)$$

This is of course assuming that the good being sold is worth precisely as much as the highest bid. A correctly written auction should return a value of 0 to every participant, because those who lose the auction must get their money back and the party that wins pays precisely the highest bid. The contract in Fig. 3 (left) is an implementation of such an auction. However, it has a slight problem. The function bid allows the winner to reduce her bid. This bug is fixed in the contract on the right.

**Fig. 3.** A buggy auction contract (left) and its fixed version (right).

**Three-Way Lottery.** Consider a three-party lottery contract issued by a party p. The other two players can sign up by buying tickets worth 1 unit each. Then each of the players is supposed to randomly and uniformly choose a nonce. A combination of these nonces produces the winner with equal probability for all three parties. If a person does not make a choice or pay the fees, she will certainly lose the lottery. The rules are such that if the other two parties choose the same nonce, which is supposed to happen with probability <sup>1</sup> <sup>3</sup> , then the issuer wins. Otherwise the winner is chosen according to the parity of sum of nonces. This gives everyone a winning probability of <sup>1</sup> <sup>3</sup> if all sides play uniformly at random. However, even if one of the sides refuses to play uniformly at random, the resulting probabilities of winning stays the same because each side's probability of winning is independent of her own choice assuming that others are playing randomly. We assume that the issuer <sup>p</sup> has objective <sup>p</sup><sup>+</sup> <sup>−</sup>p−. This is because the winner can take other players' money. In a bug-free contract we will expect the value of this objective to be 0, given that winning has a probability of <sup>1</sup> <sup>3</sup> . However, the bug here is due to the fact that other parties can collude. For example, the same person might register as both players and then opt for different nonces. This will ensure that the issuer loses. The bug can be solved by ensuring one's probability of winning is <sup>1</sup> <sup>3</sup> if she honestly plays uniformly at random, no matter what other parties do. For more details about this contract see [19].

**Token Sale.** Consider a contract that sells *tokens* modeling some aspect of the real world, e.g. shares in a company. At first anyone can buy tokens at a fixed price of 1 unit per token. However, there are a limited number of tokens available and at most 1000 of them are meant to be sold. The tokens can then be transferred between parties, which is the subject of our next example. For now, Fig. 4 (left) is an implementation of the selling phase. However, there is a big problem here. The problem is that one can buy any number of tokens as long as there is at least one token remaining. For example, one might first buy 999 tokens and then buy another 1000. If we analyze the contract from the point of view of a solo party <sup>p</sup> with objective balance[p], then it must be capped by 1000 in a bug-free contract, while the process described above leads to a value of 1999. The fixed contract is in Fig. 4 (right). This bug is inspired by a very similar real-world bug described in [52].

**Token Transfer.** Consider the same bug-free token sale as in the previous example, we now add a function for transferring tokens. An owner can choose a recipient and an amount less than or equal to her balance and transfer that many tokens to the recipient. Figure 5 (left) is an implementation of this concept. Taking the same approach and objective as above, we expect a similar result. However, there is again an important bug in this code. What happens if a party transfers tokens to herself? She gets free extra tokens! This has been fixed in the contract on the right. This example models a real-world bug as in [42].

**Fig. 4.** A buggy token sale (left) and its fixed version (right).

**Translation to Solidity.** All aspects of our programming language are already present in Solidity, except for the global clock and concurrent interactions. The global clock can be modeled by the number of the current block in the blockchain and concurrent interactions can be implemented using commitment schemes. For more details see [19].

### **4 Bounded Analysis and Games**

Since smart contracts can be easily described in our programming language, and programs in our programming language can be translated to Solidity, the

**Fig. 5.** A buggy transfer function (left) and its fixed version (right).

main aim to automatically compute values of contracts (i.e., compute guaranteed payoff for parties). In this section, we introduce the bounded analysis problem for our programming language framework, and present concurrent games which is the underlying mathematical framework for the bounded analysis problem.

#### **4.1 Bounded Analysis**

As is standard in verification, we consider the bounded analysis problem, where the number of parties and the number of function calls are bounded. In standard program analysis, bugs are often detected with a small number of processes, or a small number of context switches between concurrent threads. In the context of smart contracts, we analogously assume that the number of parties and function calls are bounded.

*Contracts with Bounded Number of Parties and Function Calls.* Formally, a contract with bounded number of parties and function calls is as follows:


closely resembles real-life contracts in which one's ability to call many functions is limited by the capacity of a block in the blockchain, given that the block must save all messages.

### **4.2 Concurrent Games**

The programming language framework we consider has interacting agents that act simultaneously, and we have the program state. We present the mathematical framework of concurrent games, which are games played on finite state spaces with concurrent interaction between the players.

*Concurrent Game Structures.* A concurrent two-player game structure is a tuple <sup>G</sup> = (S, s0, A, Γ1, Γ2, δ), where <sup>S</sup> is a finite set of states, <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup> is the start state, <sup>A</sup> is a finite set of actions, <sup>Γ</sup>1, Γ<sup>2</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> \ ∅ such that <sup>Γ</sup><sup>i</sup> assigns to each state <sup>s</sup> <sup>∈</sup> <sup>S</sup>, a non-empty set <sup>Γ</sup>i(s) <sup>⊆</sup> <sup>A</sup> of actions available to player <sup>i</sup> at <sup>s</sup>, and finally <sup>δ</sup> : <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>S</sup> is a transition function that assigns to every state <sup>s</sup> <sup>∈</sup> <sup>S</sup> and action pair <sup>a</sup><sup>1</sup> <sup>∈</sup> <sup>Γ</sup>1(s), a<sup>2</sup> <sup>∈</sup> <sup>Γ</sup>2(s) a successor state <sup>δ</sup>(s, a1, a2) <sup>∈</sup> <sup>S</sup>.

*Plays and Histories.* The game starts at state <sup>s</sup>0. At each state <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>S</sup>, player 1 chooses an action a<sup>i</sup> <sup>1</sup> <sup>∈</sup> <sup>Γ</sup>1(si) and player 2 chooses an action <sup>a</sup><sup>i</sup> <sup>2</sup> <sup>∈</sup> <sup>Γ</sup>2(si). The choices are made simultaneously and independently. The game subsequently transitions to the new state s<sup>i</sup>+1 = δ(si, a1, a2) and the same process continues. This leads to an infinite sequence of tuples p = si, a<sup>i</sup> 1, a<sup>i</sup> 2 <sup>∞</sup> <sup>i</sup>=0 which is called a *play* of the game. We denote the set of all plays by *P*. Every finite prefix p[..r] := (s0, a<sup>0</sup> 1, a<sup>0</sup> 2),(s1, a<sup>1</sup> 1, a<sup>1</sup> 2),...,(sr, a<sup>r</sup> 1, a<sup>r</sup> 2) of a play is called a *history* and the set of all histories is denoted by *H* . If h = p[..r] is a history, we denote the last state appearing according to h, i.e. s<sup>r</sup>+1 = δ(sr, a<sup>r</sup> 1, a<sup>r</sup> <sup>2</sup>), by *last*(h). We also define <sup>p</sup>[.. <sup>−</sup> 1] as the empty history.

*Strategies and Mixed Strategies.* A strategy is a recipe that describes for a player the action to play given the current game history. Formally, a strategy ϕ<sup>i</sup> for player <sup>i</sup> is a function <sup>ϕ</sup><sup>i</sup> : *<sup>H</sup>* <sup>→</sup> <sup>A</sup>, such that <sup>ϕ</sup>i(h) <sup>∈</sup> <sup>Γ</sup>i(*last*(h)). A pair ϕ = (ϕ1, ϕ2) of strategies for the two players is called a strategy profile. Each such <sup>ϕ</sup> induces a unique play. A mixed strategy <sup>σ</sup><sup>i</sup> : *<sup>H</sup>* <sup>→</sup> <sup>Δ</sup>(A) for player <sup>i</sup> given the history of the game. Intuitively, such a strategy suggests a distribution of actions to player i at each step and then she plays one of them randomly according to that distribution. Of course it must be the case that Supp(σi(h)) <sup>⊆</sup> <sup>Γ</sup>i(*last*(h)). A pair σ = (σ1, σ2) of mixed strategies for the two players is called a mixed strategy profile. Note that mixed strategies generalize strategies with randomization. Every mixed strategy profile σ = (σ1, σ2) induces a unique probability measure on the set of plays, which is denoted as Prob<sup>σ</sup>[·], and the associated expectation measure is denoted by <sup>E</sup><sup>σ</sup>[·].

*State and History Utilities.* In a game structure G, a state utility function u for player 1 is of the form <sup>u</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup>. Intuitively, this means that when the game enters state s, player 1 receives a reward of u(s). State utilities can be extended to history utilities. We define the utility of a history to be the sum of utilities of all the states included in that history. Formally, if h = si, a<sup>i</sup> 1, a<sup>i</sup> 2 r <sup>i</sup>=0, then <sup>u</sup>(h) = <sup>r</sup> <sup>i</sup>=0 <sup>u</sup>(si). Given a play <sup>p</sup> <sup>∈</sup> *<sup>P</sup>*, we denote the utility of its prefix of length L by uL(p).

*Games.* A game is a pair (G, u) where G is a game structure and u is a utility function for player 1. We assume that player 1 is trying to maximize u, while player 2's goal is to minimize it.

*Values.* The L-step finite-horizon value of a game (G, u) is defined as

$$\text{vol}\_{\mathsf{L}}(G, u) := \sup\_{\sigma\_1} \inf\_{\sigma\_2} \mathbb{E}^{(\sigma\_1, \sigma\_2)} \left[ u\_{\mathsf{L}}(p) \right], \tag{1}$$

where σ<sup>i</sup> iterates over all possible mixed strategies of player i. This models the fact that player 1 is trying to maximize the utility in the first L steps of the run, while player 2 is minimizing it. The values of games can be computed using the value-iteration algorithm or dynamic programming, which is standard. A more detailed overview of the algorithms for games is provided in [19].

*Remark 2.* Note that in (1), limiting player 2 to pure strategies does not change the value of the game. Hence, we can assume that player 2 is an arbitrarily powerful nondeterministic adversary and get the exact same results.

#### **4.3 Translating Contracts to Games**

The translation from bounded smart contracts to games is straightforward, where the states of the concurrent game encodes the states of the contract. Correspondences between objects in the contract and game are as follows: (a) moves in contracts with actions in games; (b) run prefixes in contracts with histories in games; (c) runs in contracts with plays in games; and (d) policies (resp., randomized policies) in contracts with strategies (resp., mixed strategies) in games. Note that since all runs of the bounded contract are finite and have a limited length, we can apply finite horizon analysis to the resulting game, where L is the maximal length of a run in the contract. This gives us the following theorem:

**Theorem 1 (Correspondence).** *Given a bounded contract* C<sup>k</sup> *for a party* p *with objective* o*, a concurrent game can be constructed such that value of this game,* υL(G, u)*, is equal to the value of the bounded contract,* V(Ck, o, p)*.*

For details of the translation of smart contracts to games and proof of the theorem above see [19].

*Remark 3.* In standard programming languages, there are no parties to interact and hence the underlying mathematical models are graphs. In contrast, for smart contracts programming languages, where parties interact in a game-like manner, we have to consider games as the mathematical basis of our analysis.

### **5 Abstraction for Quantitative Concurrent Games**

Abstraction is a key technique to handle large-scale systems. In the previous section we described that smart contracts can be translated to games, but due to state-space explosion (since we allow integer variables), the resulting state space of the game is huge. Hence, we need techniques for abstraction, as well as refinement of abstraction, for concurrent games with quantitative utilities. In this section we present such abstraction refinement for quantitative concurrent games, which is our main technical contribution in this paper. We show the soundness of our approach and its completeness in the limit. Then, we introduce a specific method of abstraction, called interval abstraction, which we apply to the games obtained from contracts and show that soundness and refinement are inherited from the general case. We also provide a heuristic for faster refining of interval abstractions for games obtained from contracts.

### **5.1 Abstraction for Quantitative Concurrent Games**

Abstraction considers a partition of the state space, and reduces the number of states by taking each partition set as a state. In case of transition systems (or graphs) the standard technique is to consider existential (or universal) abstraction to define transitions between the partition sets. However, for game-theoretic interactions such abstraction ideas are not enough. We now describe the key intuition for abstraction in concurrent games with quantitative objectives and formalize it. We also provide a simple example for illustration.

*Abstraction Idea and Key Intuition.* In an abstraction the state space of the game (G, u) is partitioned into several abstract states, where an abstract state represents a set of states of the original game. Intuitively, an abstract state represents a set of similar states of the original game. Given an abstraction our goal is to define two games that can provide lower and upper bound on the value of the original game. This leads to the concepts of lower and upper abstraction.


Informally, the lower abstraction gives more power to the adversary, player 2, whereas the upper abstraction is favorable to player 1.

*General Abstraction for Concurrent Games.* Given a game (G, u) consisting of a game structure G = (S, s0, A, Γ1, Γ2, δ) and a utility function u, and a partition Π of S, the lower and upper abstractions, (G<sup>↓</sup> = (Sa, s<sup>a</sup> 0, Aa, Γ<sup>↓</sup> <sup>1</sup> , Γ<sup>↓</sup> <sup>2</sup> , δ↓), u↓) and (G<sup>↑</sup> = (Sa, sa 0, Aa, Γ<sup>↑</sup> <sup>1</sup> , Γ<sup>↑</sup> <sup>2</sup> , δ↑), u↑), of (G, u) with respect to Π are defined as:


Given a partition Π of S, either (i) there is no lower or upper abstraction corresponding to it because it puts states with different sets of available actions together; or (ii) there is a unique lower and upper abstraction pair. Hence we will refer to the unique abstracted pair of games by specifying Π only.

*Remark 4.* Dummy states are introduced for conceptual clarity in explaining the ideas because in lower abstraction all choices are assigned to player 2 and upper abstraction to player 1. However, in practice, there is no need to create them, as the choices can be allowed to the respective players in the predecessor state.

*Example.* Figure 6 (left) shows a concurrent game with (G, u) with 4 states. The utilities are denoted in red. The edges correspond to transitions in δ and each edge is labeled with its corresponding action pair. Here <sup>A</sup> <sup>=</sup> {a, <sup>b</sup>}, <sup>Γ</sup>1(s0) = <sup>Γ</sup>2(s0) = <sup>Γ</sup>2(s1) = <sup>Γ</sup>1(s2) = <sup>Γ</sup>2(s2) = <sup>Γ</sup>2(s3) = <sup>A</sup> and <sup>Γ</sup>1(s1) = <sup>Γ</sup>1(s3) = {a}. Given that action sets for s<sup>0</sup> and s<sup>2</sup> are equal, we can create abstracted games using the partition <sup>Π</sup> <sup>=</sup> {π0, π1, π2} where <sup>π</sup><sup>1</sup> <sup>=</sup> {s0, s2} and other sets are singletons. The resulting game structure is depicted in Fig. 6 (center). Dummy states are shown by circles and whenever a play reaches a dummy state in G↓, player 2 chooses which red edge should be taken. Conversely, in G<sup>↑</sup> player 1 makes this choice. Also, <sup>u</sup>↑(π0) = max{u(s0), u(s2)} = 10, u↓(π0) = min{u(s0), u(s2)} = 0

**Fig. 6.** An example concurrent game (left), abstraction process (center) and the corresponding *G*<sup>↓</sup> without dummy states (right).

and u↑(π1)u↓(π1) = u(s1) = 10, u↑(π2) = u↓(π2) = u(s3) = 0. The final abstracted G<sup>↓</sup> of the example above, without dummy states, is given in Fig. 6 (right).

### **5.2 Abstraction: Soundness, Refinement, and Completeness in Limit**

For an abstraction we need three key properties: (a) soundness, (b) refinement of the abstraction, and (c) completeness in the limit. The intuitive description is as follows: (a) soundeness requires that the value of the games is between the value of the lower and upper abstraction; (b) refinement requires that if the partition is refined, then the values of lower and upper abstraction becomes closer; and (c) completeness requires that if the partitions are refined enough, then the value of the original game can be approximated. We present each of these results below.

**Soundness.** Soundness means that when we apply abstraction, value of the original game must lie between values of the lower and upper abstractions. Intuitively, this means abstractions must provide us with some interval containing the value of the game. We expect the value of (G↓, u↓) to be less than or equal to the value of the original game because in (G↓, u↓), the utilities are less than in (G, u) and player 2 has more power, given that she can choose which transition to take. Conversely, we expect (G↑, u↑) to have a higher value than (G, u).

*Formal Requirement for Soundness.* An abstraction of a game (G, u) leading to abstraction pair (G↑, u↑),(G↓, u↓) is sound if for every <sup>L</sup>, we have <sup>υ</sup><sup>2</sup>L(G↓, u↓) <sup>≤</sup> <sup>υ</sup>L(G, u) <sup>≤</sup> <sup>υ</sup><sup>2</sup>L(G↑, u↑). The factor 2 in the inequalities above is due to the fact that each transition in the original game is modeled by two transitions in abstracted games, one to a dummy state and a second one out of it. We now present our soundness result.

**Theorem 2 (Soundness, Proof in** [19]**).** *Given a game* (G, u) *and a partition* Π *of its state space, if* G<sup>↑</sup> *and* G<sup>↓</sup> *exist, then the abstraction is sound, i.e. for all* <sup>L</sup>*, it is the case that* <sup>υ</sup><sup>2</sup>L(G↓, u↓) <sup>≤</sup> <sup>υ</sup>L(G, u) <sup>≤</sup> <sup>υ</sup><sup>2</sup>L(G↑, u↑)*.*

**Refinement.** We say that a partition Π<sup>2</sup> is a refinement of a partition Π1, and write Π<sup>2</sup> <sup>Π</sup>1, if every <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>1</sup> is a union of several <sup>π</sup>i's in <sup>Π</sup>2, i.e. <sup>π</sup> <sup>=</sup> <sup>i</sup>∈I <sup>π</sup><sup>i</sup> and for all <sup>i</sup> ∈ I, <sup>π</sup><sup>i</sup> <sup>∈</sup> <sup>Π</sup>2. Intuitively, this means that <sup>Π</sup><sup>2</sup> is obtained by further subdividing the partition sets in Π1. It is easy to check that is a partial order over partitions. We expect that if Π<sup>2</sup> Π1, then the abstracted games resulting from Π<sup>2</sup> give a better approximation of the value of the original game in comparison with abstracted games resulting from Π1. This is called the refinement property.

*Formal Requirement for the Refinement Property.* Two abstractions of a game (G, u) using two partitions <sup>Π</sup>1, <sup>Π</sup>2, such that <sup>Π</sup><sup>2</sup> Π1, and leading to abstracted games (G<sup>↑</sup> <sup>i</sup> , u<sup>↑</sup> <sup>i</sup> ),(G<sup>↓</sup> <sup>i</sup> , u<sup>↓</sup> <sup>i</sup> ) corresponding to each Π<sup>i</sup> satisfy the refinement property if for every L, we have υ<sup>2</sup>L(G<sup>↓</sup> 1, u<sup>↓</sup> <sup>1</sup>) <sup>≤</sup> <sup>υ</sup><sup>2</sup>L(G<sup>↓</sup> 2, u<sup>↓</sup> <sup>2</sup>) <sup>≤</sup> <sup>υ</sup><sup>2</sup>L(G<sup>↑</sup> 2, u<sup>↑</sup> <sup>2</sup>) <sup>≤</sup> <sup>υ</sup><sup>2</sup>L(G<sup>↑</sup> 1, u<sup>↑</sup> 1).

**Theorem 3 (Refinement Property, Proof in** [19]**).** *Let* Π<sup>2</sup> Π<sup>1</sup> *be two partitions of the state space of a game* (G, u)*, then the abstractions corresponding to* Π1, Π<sup>2</sup> *satisfy the refinement property.*

**Completeness in the Limit.** We say that an abstraction is complete in the limit, if by refining it enough the values of upper and lower abstractions get as close together as desired. Equivalently, this means that if we want to approximate the value of the original game within some predefined threshold of error, we can do so by repeatedly refining the abstraction.

*Formal Requirement for Completeness in the Limit.* Given a game (G, u), a fixed finite-horizon L and an abstracted game pair corresponding to a partition Π1, the abstraction is said to be complete in the limit, if for every  <sup>≥</sup> 0 there exists Π<sup>2</sup> Π1, such that if (G<sup>↓</sup> 2, u<sup>↓</sup> 2),(G<sup>↑</sup> 2, u<sup>↑</sup> <sup>2</sup>) are the abstracted games corresponding to Π2, then υL(G<sup>↑</sup> 2, u<sup>↑</sup> <sup>2</sup>) <sup>−</sup> <sup>υ</sup>L(G<sup>↓</sup> 2, u<sup>↓</sup> <sup>2</sup>) <sup>≤</sup> .

**Theorem 4 (Completeness in the Limit, Proof in** [19]**).** *Every abstraction on a game* (G, u) *using a partition* Π *is complete in the limit for all values of* L*.*

### **5.3 Interval Abstraction**

In this section, we turn our focus to games obtained from contracts and provide a specific method of abstraction that can be applied to them.

*Intuitive Overview.* Let (G, u) be a concurrent game obtained from a contract as in the Sect. 4.3. Then the states of G, other than the unique dummy state, correspond to states of the contract Ck. Hence, they are of the form s = (t, b, l, val, p), where t is the time, b the contract balance, l is a label, p is the party calling the current function and val is a valuation. In an abstraction, one cannot put states with different times or labels or callers together, because they might have different moves and hence different action sets in the corresponding game. The main idea in interval abstraction is to break the states according to intervals over their balance and valuations. We can then refine the abstraction by making the intervals smaller. We now formalize this concept.

*Objects.* Given a contract <sup>C</sup>k, let <sup>O</sup> be the set of all objects that can have an integral value in a state s of the contract. This consists of the contract balance, numeric variables and m[p]'s where m is a map variable and p is a party. More precisely, <sup>O</sup> <sup>=</sup> {B} ∪ <sup>N</sup> ∪ {m[p]|<sup>m</sup> <sup>∈</sup> M, <sup>p</sup> <sup>∈</sup> <sup>P</sup>} where <sup>B</sup> denotes the balance. For an <sup>o</sup> ∈ O, the value assigned to <sup>o</sup> at state <sup>s</sup> is denoted by <sup>o</sup>s.

*Interval Partition.* Let C<sup>k</sup> be a contract and (G, u) its corresponding game. A partition Π of the state space of G is called an interval partition if:


We call an abstraction using an interval partition, an interval abstraction.

*Refinement Heuristic.* We can start with big intervals and continually break them into smaller ones to get refined abstractions and a finer approximation of the game value. We use the following heuristic to choose which intervals to break: Assume that the current abstracted pair of games are (G↓, u↓) and (G↑, u↑) corresponding to an interval partition Π. Let *d* = (π*d*, a1, a2) be a dummy state in G<sup>↑</sup> and define the skewness of *d* as υ(G<sup>↑</sup> *d*, u↑)−υ(G<sup>↓</sup> *<sup>d</sup>*, u↓). Intuitively, skewness of *d* is a measure of how different the outcomes of the games G<sup>↑</sup> and G<sup>↓</sup> are, from the point when they have reached *d*. Take a label l with maximal average skewness among its corresponding dummy states and cut all non-unit intervals of it in more parts to get a new partition Π . Continue the same process until the approximation is as precise as desired. Intuitively, it tries to refine parts of the abstraction that show the most disparity between G<sup>↓</sup> and G<sup>↑</sup> with the aim to bring their values closer. Our experiments show its effectiveness.

*Soundness and Completeness in the Limit.* If we restrict our attention to interval abstractions, soundness is inherited from general abstractions and completeness in the limit holds because Π<sup>∗</sup> is an interval partition. Therefore, using interval abstractions is both sound and complete in the limit.

*Interval Refinement.* An interval partition Π is interval refinement of a given interval partition Π if Π Π. Refinement property is inherited from general abstractions. This intuitively means that Π is obtained by breaking the intervals in some sets of Π into smaller intervals.

*Conclusion.* We devised a sound abstraction-refinement method for approximating values of contracts. Our method is also complete in the limit. It begins by converting the contract to a game, then applies interval abstraction to the resulting game and repeatedly refines the abstraction using a heuristic until the desired precision is reached.

### **6 Experimental Results**

**Implementation and Optimizations.** The state-space of the games corresponding to the smart contracts is huge. Hence the original game corresponding to the contract is computationally too expensive to construct. Therefore, we do not first construct the game and then apply abstraction, instead we first apply the interval abstraction, and construct the lower and upper abstraction and compute values in them. We optimized our implementation by removing dummy states and exploiting acyclicity using backward-induction. More details are provided in [19].

**Experimental Results.** We present our experimental results (Table 1) for the five examples mentioned in Sect. 3.4. In each of the examples, the original game is quite large, and the size of the state space is calculated without creating them. In our experimental results we show the abstracted game size, the refinement of games to larger sizes, and how the lower and upper bound on the values change. We used an Ubuntu machine with 3.2 GHz Intel i7-5600U CPU and 12 GB RAM.

*Interpretation of the Experimental Results.* Our results demonstrate the effectiveness of our approach in automatically approximating values of large games and real-world smart contracts. Concretely, the following points are shown:


### **7 Comparison with Related Work**

*Blockchain Security Analysis.* The first security analysis of Bitcoin protocol was done by Nakamoto [43] who showed resilience of the blockchain against double-spending. A stateful analysis was done by Sapirshtein et al. [47] and by Sompolinsky and Zohar [49] in which states of the blockchain were considered. It was done using MDPs where only the attacker decides on her actions and the victim follows a predefined protocol. Our paper is the first work that is using two-player and concurrent games to analyze contracts and the first to use stateful analysis on arbitrary smart contracts, rather than a specific protocol.

**Table 1.** Experimental results for correct and buggy contracts. *l* := υ(*G*↓*, u*↓) denotes the lower value and *u* := υ(*G*↑*, u*↑) is the upper value. Times are in seconds.






*Smart Contract Security.* Delmolino et al. [29] held a contract programming workshop and showed that even simple contracts can contain incentive misalignment bugs. Luu et al. [41] introduced a symbolic model checker with which they could detect specific erroneous patterns. However the use of model checker cannot be extended to game-theoretic analysis. Bhargavan et al. [9] translated solidity programs to F<sup>∗</sup> and then used standard verification tools to detect vulnerable code patterns. See [7] for a survey of the known causes for Solidity bugs that result in security vulnerabilities.

*Games and Verification.* Abstraction for concurrent games has been considered wrt qualitative temporal objectives [3,22,28,44]. Several works considered concurrent games with only pure strategies [28,36,37]. Concurrent games with pure strategies are extremely restrictive and effectively similar to turn-based games. The min-max theorem (determinacy) does not hold for them even in special cases of one-shot games or games with qualitative objectives.

Quantitative analysis with games is studied in [12,17,21]. However these approaches either consider games without concurrent interactions or do not consider any abstraction-refinement. A quantitative abstraction-refinement framework has been considered in [18]; however, there is no game-theoretic interaction. Abstraction-refinement for games has also been considered [20,36]; however, these works neither consider games with concurrent interaction, nor quantitative objectives. Moreover, [20,36] start with a finite-state model without variables, and interval abstraction is not applicable to these game-theoretic frameworks. In contrast, our technical contribution is an abstraction-refinement approach for quantitative games and its application to analysis of smart contracts.

*Formal Methods in Security.* There is a huge body of work on program analysis for security; see [1,46] for survey. Formal methods are used to create safe programming languages (e.g., [34,46]) and to define new logics that can express security properties (e.g., [5,6,15]). They are also used to automatically verify security and cryptographic protocols, e.g., [2,8,11] for a survey. However, all of these works aimed to formalize qualitative properties such as privacy violation and information leakage. To the best of our knowledge, our framework is the first attempt to use formal methods as a tool for reasoning about monetary loses and identifying them as security errors.

*Bounded Model Checking (BMC).* BMC was proposed by Biere et al. in 1999 [10]. The idea in BMC is to search for a counterexample in executions whose length is at most k. If no bug is found then one increases k until either a bug is found, the problem becomes intractable, or some pre-known upper bound is reached.

*Interval Abstraction.* The first infinite abstract domain was introduced in [25]. This was later used to prove that infinite abstract domains can lead to effective static analysis for a given programming language [26]. However, none of the standard techniques is applicable to game analysis.

### **8 Conclusion**

In this work we present a programming language for smart contracts, and an abstraction-refinement approach for quantitative concurrent games to automatically analyze (i.e., compute worst-case guaranteed utilities of) such contracts. This is the first time a quantitative stateful game-theoretic framework is studied for formal analysis of smart contracts. There are several interesting directions of future work. First, we present interval-based abstraction techniques for such games, and whether different abstraction techniques can lead to more scalability or other classes of contracts is an interesting direction of future work. Second, since we consider worst-case guarantees, the games we obtain are two-player zero-sum games. The extension to study multiplayer games and compute values for rational agents is another interesting direction of future work. Finally, in this work we do not consider interaction between smart contracts, and an extension to encompass such study will be a subject of its own.

**Acknowledgments.** The research was partially supported by Vienna Science and Technology Fund (WWTF) Project ICT15-003, Austrian Science Fund (FWF) NFN Grant No S11407-N23 (RiSE/SHiNE), and ERC Starting grant (279307: Graph Games).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Session Types and Concurrency

### **Session-Typed Concurrent Contracts**

Hannah Gommerstadt(B), Limin Jia, and Frank Pfenning

Carnegie Mellon University, Pittsburgh, PA, USA {hgommers,fp}@cs.cmu.edu, liminjia@cmu.edu

**Abstract.** In sequential languages, dynamic contracts are usually expressed as boolean functions without externally observable effects, written within the language. We propose an analogous notion of concurrent contracts for languages with session-typed message-passing concurrency. Concurrent contracts are partial identity processes that monitor the bidirectional communication along channels and raise an alarm if a contract is violated. Concurrent contracts are session-typed in the usual way and must also satisfy a transparency requirement, which guarantees that terminating compliant programs with and without the contracts are observationally equivalent. We illustrate concurrent contracts with several examples. We also show how to generate contracts from a refinement session-type system and show that the resulting monitors are redundant for programs that are well-typed.

**Keywords:** Contracts · Session types · Monitors

### **1 Introduction**

Contracts, specifying the conditions under which software components can safely interact, have been used for ensuring key properties of programs for decades. Recently, contracts for distributed processes have been studied in the context of session types [15,17]. These contracts can enforce the communication protocols, specified as session types, between processes. In this setting, we can assign each channel a monitor for detecting whether messages observed along the channel adhere to the prescribed session type. The monitor can then detect any deviant behavior the processes exhibit and trigger alarms. However, contracts based solely on session types are inherently limited in their expressive power. Many contracts that we would like to enforce cannot even be stated using session types alone. As a simple example, consider a "factorization service" which may be sent a (possibly large) integer x and is supposed to respond with a list of prime factors. Session types can only express that the request is an integer and the response is a list of integers, which is insufficient.

In this paper, we show that by generalizing the class of monitors beyond those derived from session types, we can enforce, for example, that multiplying the numbers in the response yields the original integer x. This paper focuses on monitoring more expressive contracts, specifically those that cannot be expressed with session types, or even refinement types.

To handle these contracts, we have designed a model where our monitors execute as transparent processes alongside the computation. They are able to maintain internal state which allows us to check complex properties. These monitoring processes act as partial identities, which do not affect the computation except possibly raising an alarm, and merely observe the messages flowing through the system. They then perform whatever computation is needed, for example, they can compute the product of the factors, to determine whether the messages are consistent with the contract. If the message is not consistent, they stop the computation and blame the process responsible for the mistake. To show that our contracts subsume refinement-based contracts, we encode refinement types in our model by translating refinements into monitors. This encoding is useful because we can show a blame (safety) theorem stating that monitors that enforce a less precise refinement type than the type of the process being monitored will not raise alarms. Unfortunately, the blame theory for the general model is challenging because the contracts cannot be expressed as types.

The main contributions of this paper are:


The rest of this paper is organized as follows. We first review the background on session types in Sect. 2. Next, we show a range of example contracts in Sect. 3. In Sect. 4, we show how to check that a monitor process is a partial identity and prove the method correct. We then show how we can encode refinements in our system in Sect. 5. We discuss related work in Sect. 6. Due to space constraints, we only present the key theorems. Detailed proofs can be found in our companion technical report [12].

### **2 Session Types**

Session types prescribe the communication behavior of message-passing concurrent processes. We approach them here via their foundation in intuitionistic linear logic [4,5,22]. The key idea is that an intuitionistic linear sequent

$$A\_1, \ldots, A\_n \vdash C$$

is interpreted as the interface to a *process expression* P. We label each of the antecedents with a channel name a<sup>i</sup> and the succedent with a channel name c. The a<sup>i</sup> are the channels *used* and c is the channel *provided* by P.

$$a\_1: A\_1, \ldots, a\_n: A\_n \vdash P :: (c:C)$$

We abbreviate the antecedents by Δ. All the channels a<sup>i</sup> and c must be distinct, and bound variables may be silently renamed to preserve this invariant in the rules. Furthermore, the antecedents are considered modulo exchange. Cut corresponds to parallel composition of two processes that communicate along a private channel x, where P is the *provider* along x and Q the *client*.

$$\frac{\Delta \vdash P :: (x:A) \quad x:A, \Delta' \vdash Q :: (c:C)}{\Delta, \Delta' \vdash x:A \gets P ; Q :: (c:C)} \text{ cut}$$

Operationally, the process <sup>x</sup> <sup>←</sup> <sup>P</sup> ; <sup>Q</sup> spawns <sup>P</sup> as a new process and continues as Q, where P and Q communicate along a fresh channel a, which is substituted for x. We sometimes omit the type A of x in the syntax when it is not relevant.

In order to define the operational semantics rigorously, we use *multiset rewriting* [6]. The configuration of executing processes is described as a collection C of propositions proc(c, P) (process P is executing, providing along c) and msg(c, M) (message M is sent along c). All the channels c provided by processes and messages in a configuration must be distinct.

A cut spawns a new process, and is in fact the only way new processes are spawned. We describe a transition C −→ C by defining how a subset of C can be rewritten to a subset of C , possibly with a freshness condition that applies to all of C in order to guarantee the uniqueness of each channel provided.

$$\mathsf{proc}(c, x; A \gets P, Q) \longrightarrow \mathsf{proc}(a, [a/x]P), \mathsf{proc}(c, [a/x]Q) \quad (a \; fresh)$$

Each of the connectives of linear logic then describes a particular kind of communication behavior which we capture in similar rules. Before we move on to that, we consider the identity rule, in logical form and operationally.

<sup>A</sup> - <sup>A</sup> id <sup>b</sup> : <sup>A</sup> <sup>a</sup> <sup>←</sup> <sup>b</sup> :: (<sup>a</sup> : <sup>A</sup>) id proc(a, a <sup>←</sup> <sup>b</sup>), C −→ [b/a]<sup>C</sup>

Operationally, it corresponds to identifying the channels a and b, which we implement by substituting <sup>b</sup> for <sup>a</sup> in the remainder <sup>C</sup> of the configuration (which we make explicit in this rule). The process offering <sup>a</sup> terminates. We refer to <sup>a</sup> <sup>←</sup> <sup>b</sup> as *forwarding* since any messages along a are instead "forwarded" to b.

We consider each class of session type constructors, describing their process expression, typing, and asynchronous operational semantics. The linear logical semantics can be recovered by ignoring the process expressions and channels.

**Internal and External Choice.** Even though we distinguish a *provider* and its *client*, this distinction is orthogonal to the direction of communication: both may either send or receive along a common private channel. Session typing guarantees that both sides will always agree on the direction and kind of message that is sent or received, so our situation corresponds to so-called *binary session types*.

First, the *internal choice* <sup>c</sup> : <sup>A</sup> <sup>⊕</sup> <sup>B</sup> requires the provider to send a token inl or inr along c and continue as prescribed by type A or B, respectively. For practical programming, it is more convenient to support n-ary labelled choice ⊕{ : <sup>A</sup>-}-<sup>∈</sup><sup>L</sup> where <sup>L</sup> is a set of labels. A process providing <sup>c</sup> : ⊕{ : <sup>A</sup>-}-<sup>∈</sup><sup>L</sup> sends a label <sup>k</sup> <sup>∈</sup> <sup>L</sup> along <sup>c</sup> and continues with type <sup>A</sup>k. The client will operate dually, branching on a label received along c.

$$\frac{k \in L \quad \Delta \vdash P :: (c:A\_k)}{\Delta \vdash c.k \; : \; (c:\oplus \{\ell : A\_{\ell}\}\_{\ell \in L})} \; \oplus R \quad \quad \frac{\Delta, c:A\_{\ell} \vdash Q\_{\ell} :: (d:D) \quad \text{for every } \ell \in L}{\Delta, c:\oplus \{\ell : A\_{\ell}\}\_{\ell \in L} \vdash \text{case } c \; (\ell \Rightarrow Q\_{\ell})\_{\ell \in L} :: (d:D)} \; \oplus L$$

The operational semantics is somewhat tricky, because we communicate asynchronously. We need to spawn a message carrying the label , but we also need to make sure that the *next* message sent along the same channel does not overtake the first (which would violate session fidelity). Sending a message therefore creates a fresh continuation channel c for further communication, which we substitute in the continuation of the process. Moreover, the recipient also switches to this continuation channel after the message is received.

$$\begin{array}{l} \mathsf{proc}(c, c.k \; ; P) \longrightarrow \mathsf{proc}(c', [c'/c]P), \mathsf{msg}(c, c.k \; ; c \gets c') \quad (c' \; fresh) \\ \mathsf{msg}(c, c.k \; ; c \gets c'), \mathsf{proc}(d, \mathsf{case} \; c \; (\ell \Rightarrow Q\_{\ell})\_{\ell \in L}) \longrightarrow \mathsf{proc}(d, [c'/c]Q\_{k}) \end{array}$$

It is interesting that the message along c, followed by its continuation c can be expressed as a well-typed process expression using forwarding c.k ; <sup>c</sup> <sup>←</sup> <sup>c</sup> . This pattern will work for all other pairs of send/receive operations.

External choice reverses the roles of client and provider, both in the typing and the operational rules. Below are the semantics and the typing is in Fig. 6.

$$\begin{array}{l} \mathsf{proc}(d, c.k : Q) \longrightarrow \mathsf{msg}(c', c.k : c' \gets c), \mathsf{proc}(d, [c'/c]Q) \quad (c' \operatorname{ fresh})\\ \mathsf{proc}(c, \mathsf{case} \, c\,(\ell \Rightarrow P\_{\ell})\_{\ell \in L}), \mathsf{msg}(c', c.k : c' \gets c) \longrightarrow \mathsf{proc}(c', [c'/c]P\_{k}) \end{array}$$

**Sending and Receiving Channels.** Session types are *higher-order* in the sense that we can send and receive channels along channels. Sending a channel is perhaps less intuitive from the logical point of view, so we show that and just summarize the rules for receiving.

If we provide <sup>c</sup> : <sup>A</sup> <sup>⊗</sup> <sup>B</sup>, we send a channel <sup>a</sup> : <sup>A</sup> along <sup>c</sup> and continue as <sup>B</sup>. From the typing perspective, it is a restricted form of the usual two-premise <sup>⊗</sup><sup>R</sup> rule by requiring the first premise to be an identity. This restriction separates spawning of new processes from the sending of channels.

$$\begin{array}{c} \Delta \vdash P ::\_{\bullet} B\\ \hline \Delta, a:A \vdash \mathtt{send } c \; a:P ::\_{\bullet} (c:A \otimes B) \end{array} \otimes R^{\bullet} \quad \begin{array}{c} \Delta, x:A, c:B \vdash Q :: (d:D) \\ \hline \Delta, c:A \otimes B \vdash x \leftarrow \mathtt{recv } c \; ; Q :: (d:D) \end{array} \otimes L$$

The operational rules follow the same patterns as the previous case.

$$\begin{array}{c} \mathsf{procc}(c, \mathsf{send} \ c \ a \ ; P) \longrightarrow \mathsf{procc}(c', [c'/c]P), \mathsf{msg}(\mathsf{send} \ c \ a \ ; c \leftarrow c') \quad (c' \ fresh\mathsf{s})\\ \mathsf{imgs}(c, \mathsf{send} \ c \ a \ ; c \leftarrow c'), \mathsf{procc}(d, x \leftarrow \mathsf{recv} \ c \ ; Q) \longrightarrow \mathsf{procc}(d, [c'/c][a/x]Q) \end{array}$$

Receiving a channel (written as a linear implication A - B) works symmetrically. Below are the semantics and the typing is shown in Fig. 6.

$$\begin{array}{l} \mathsf{proc}(d,\mathsf{send}\ c\ a\ ;Q) \longrightarrow \mathsf{msg}(c',\mathsf{send}\ c\ a\ ;c'\leftarrow c),\mathsf{proc}(d,[c'/c]Q) \quad (c'\text{ fresh})\\ \mathsf{proc}(c,x\leftarrow\mathsf{recv}\ c\ ;P),\mathsf{msg}(c',\mathsf{send}\ c\ a\ ;c'\leftarrow c) \longrightarrow \mathsf{proc}(c',[c'/c][a/x]P) \end{array}$$

**Termination.** We have already seen that a process can terminate by forwarding. Communication along a channel ends explicitly when it has type **1** (the unit of ⊗) and is closed. By linearity there must be no antecedents in the right rule.

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} (d:D) \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} (d:D) \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \Delta \vdash Q :: (d:D) \\ \end{array} \end{array} \right.$$

Since there cannot be any continuation, the message takes a simple form.

$$\begin{array}{c} \mathsf{proc}(c, \mathsf{close} \, c) \longrightarrow \mathsf{msg}(c, \mathsf{close} \, c) \\ \mathsf{msg}(c, \mathsf{close} \, c), \mathsf{proc}(d, \mathsf{wait} \, c \, ; \, Q) \longrightarrow \mathsf{proc}(d, Q) \end{array}$$

**Quantification.** First-order quantification over elements of domains such as integers, strings, or booleans allows ordinary basic data values to be sent and received. At the moment, since we have no type families indexed by values, the quantified variables cannot actually appear in their scope. This will change in Sect. 5 so we anticipate this in these rules.

The proof of an existential quantifier contains a witness term, whose value is what is sent. In order to track variables ranging over values, a new context Ψ is added to all judgments and the preceding rules are modified accordingly. All value variables n declared in context Ψ must be distinct. Such variables are not linear, but can be arbitrarily reused, and are therefore propagated to all premises in all rules. We write <sup>Ψ</sup> v : τ to check that value v has type τ in context Ψ.

<sup>Ψ</sup> <sup>v</sup> : τ Ψ ; <sup>Δ</sup> <sup>P</sup> :: (<sup>c</sup> : [v/n]A) <sup>Ψ</sup> ; <sup>Δ</sup> send c v ; <sup>P</sup> :: (<sup>c</sup> : <sup>∃</sup>n:τ.A) <sup>∃</sup><sup>R</sup> Ψ, n:<sup>τ</sup> ; Δ, c : <sup>A</sup> <sup>Q</sup> :: (<sup>d</sup> : <sup>D</sup>) <sup>Ψ</sup> ; Δ, c : <sup>∃</sup>n:τ.A <sup>n</sup> <sup>←</sup> recv <sup>c</sup> ; <sup>Q</sup> :: (<sup>d</sup> : <sup>D</sup>) <sup>∃</sup><sup>L</sup>

$$\begin{array}{c} \mathsf{procc}(c, \mathsf{send} \ c \ v \; ; \; P) \longrightarrow \mathsf{procc}(c', [c'/c]P), \mathsf{msg}(c, \mathsf{send} \ c \ v \; ; \; c \leftarrow c')\\ \mathsf{msg}(c, \mathsf{send} \ c \ v \; ; \; c \leftarrow c'), \mathsf{procc}(d, n \leftarrow \mathsf{recv} \ c \; ; \; Q) \longrightarrow \mathsf{procc}(d, [c'/c][v/n]Q) \end{array}$$

The situation for universal quantification is symmetric. The semantics are given below and the typing is shown in Fig. 6.

$$\begin{array}{c} \mathsf{procc}(d,\mathsf{send}\ c\ v\ ;Q) \longrightarrow \mathsf{msg}(c',\mathsf{send}\ c\ v\ ;c'\leftarrow c), \mathsf{procc}(d,[c'/c]Q)\\ \mathsf{procc}(c, x \leftarrow \mathsf{recv}\ c\ ;P), \mathsf{msg}(c',\mathsf{send}\ c\ v\ ;c'\leftarrow c) \longrightarrow \mathsf{procc}(c',[c'/c][v/n]P) \end{array}$$

Processes may also make internal transitions while computing ordinary values, which we don't fully specify here. Such a transition would have the form

$$\mathsf{proc}(c, P[e]) \longrightarrow \mathsf{proc}(c, P[e']) \quad \text{if} \quad e \longmapsto e'$$

where P[e] would denote a process with an ordinary value expression in evaluation position and <sup>e</sup> <sup>→</sup> <sup>e</sup> would represent a step of computation.

**Shifts.** For the purpose of monitoring, it is important to track the direction of communication. To make this explicit, we *polarize* the syntax and use *shifts* to change the direction of communication (for more detail, see prior work [18]).

$$\begin{array}{llll}\text{Negative types } A^-, B^- & ::= \& \{\ell : A\_{\ell}^-\}\_{\ell \in L} \mid A^+ \multimap B^- \mid \forall n \colon \tau. A^- \mid \uparrow A^+\\\text{Positive types } A^+, B^+ & ::= \oplus \{\ell : A\_{\ell}^+\}\_{\ell \in L} \mid A^+ \otimes B^+ \mid 1 \mid \exists n \colon \tau. A^+ \mid \downarrow A^-\\\text{Types} & A, B, C, D ::= A^- \mid A^+ \end{array}$$

From the perspective of the provider, all negative types receive and all positive types send. It is then clear that <sup>↑</sup><sup>A</sup> must receive a shift message and then start sending, while <sup>↓</sup><sup>A</sup> must send a shift message and then start receiving. For this restricted form of shift, the logical rules are otherwise uninformative. The semantics are given below and the typing is shown in Fig. 6.

proc(c,send <sup>c</sup> shift ; <sup>P</sup>) −→ proc(c- , [c- /c]P), msg(c,send <sup>c</sup> shift ; <sup>c</sup> <sup>←</sup> <sup>c</sup>- ) (c fresh) msg(c,send <sup>c</sup> shift ; <sup>c</sup> <sup>←</sup> <sup>c</sup>- ), proc(d,shift <sup>←</sup> recv <sup>d</sup> ; <sup>Q</sup>) −→ proc(d, [c- /c]Q) proc(d,send <sup>d</sup> shift ; <sup>Q</sup>) −→ msg(c- ,send c shift ; c- <sup>←</sup> <sup>c</sup>), proc(d, [c- /c]Q) proc(c,shift <sup>←</sup> recv <sup>c</sup> ; <sup>P</sup>), msg(c- ,send c shift ; c- <sup>←</sup> <sup>c</sup>) −→ proc(c- , [c- /c]P)

**Recursive Types.** Practical programming with session types requires them to be recursive, and processes using them also must allow recursion. For example, lists with elements of type int can be defined as the purely positive type list<sup>+</sup>.

```
list+ = ⊕{ cons : ∃n:int. list+ ; nil : 1 }
```
A provider of type <sup>c</sup> : list is required to send a sequence such as cons·v1·cons·v<sup>2</sup> ··· where each <sup>v</sup><sup>i</sup> is an integer. If it is finite, it must be terminated with nil · end. In the form of a grammer, we could write

 $From \mathrel{\vbox{\hbox{ $::$ }}{\mathsf{cons}}} \cdot v \cdot From \mid \mathsf{nil} \cdot \mathsf{end}$ 

A second example is a multiset (bag) of integers, where the interface allows inserting and removing elements, and testing if it is empty. If the bag is empty when tested, the provider terminates after responding with the empty label.

$$\begin{array}{c} \mathsf{bag}^- = \& \{ \begin{array}{c} \mathsf{insert}: \forall n: \mathsf{int.} \mathsf{bag}^-, \mathsf{remove}: \forall n: \mathsf{int.} \mathsf{bag}^-,\\ \mathsf{is.} \mathsf{empty}: \uparrow \oplus \{ \mathsf{empty}: \mathsf{1}, \mathsf{nonempty}: \downarrow \mathsf{bag}^- \} \end{array} \} \end{array}$$

The protocol now describes the following grammar of exchanged messages, where *To* goes to the provider, *From* comes from the provider, and v stands for integers.

```
To ::= insert · v · To | remove · v · To | is empty · shift · From
From ::= empty · end | nonempty · shift · To
```
For these protocols to be realized in this form and support rich subtyping and refinement types without change of protocol, it is convenient for recursive types to be *equirecursive*. This means a defined type such as list<sup>+</sup> is viewed as *equal* to its definition ⊕{...} rather than *isomorphic*. For this view to be consistent, we require type definitions to be *contractive* [11], that is, they need to provide at least one send or receive interaction before recursing.

The most popular formalization of equirecursive types is to introduce an explicit <sup>μ</sup>-constructor. For example, list <sup>=</sup> μα. ⊕{ cons : <sup>∃</sup>n:int. α, nil : **<sup>1</sup>** } with rules unrolling the type μα. A to [(μα. A)/α]A. An alternative (see, for example, Balzers and Pfenning 2017 [3]) is to use an explicit definition just as we stated, for example, list and bag, and consider the left-hand side *equal* to the right-hand side in our discourse. In typing, this works without a hitch. When we consider subtyping explicitly, we need to make sure we view inference systems on types as being defined *co-inductively*. Since a co-inductively defined judgment essentially expresses the absence of a counterexample, this is exactly what we need for the operational properties like progress, preservation, or absence of blame. We therefore adopt this view.

**Recursive Processes.** In addition to recursively defined types, we also need recursively defined processes. We follow the general approach of Toninho et al. [23] for the integration of a (functional) data layer into session-typed communication. A process can be named p, ascribed a type, and be defined as follows.

$$\begin{aligned} p: \forall n\_1 &\forall\_1 \dots , \forall n\_k \forall\_k \{ A \gets A\_1, \dots, A\_m \} \\ x \gets p \, n\_1 &\dots \, n\_k \gets y\_1, \dots, y\_m = P \end{aligned}$$

where we check (n1:τ1,...,nk:τk);(y1:A1,...,ym:Am) -P :: (x : A)

We use such process definitions when spawning a new process with the syntax

$$c \gets p \, e\_1 \, \dots \,, e\_k \gets d\_1 \, \dots \, d\_m \, ; \, P$$

which we check with the rule

$$\frac{\begin{array}{c} (\Psi \vdash e\_i \mathrel{\mathop{:}} \tau\_i)\_{i \in \{1, \dots, k\}} & \Delta' = (d\_1 \mathrel{\mathop{:}} A\_1, \dots, d\_m \mathrel{\mathop{:}} A\_m) & \Psi \mathrel{\mathop{:}}, \Delta, c : A \vdash Q :: (d : D) }{\Psi \mathrel{\mathop{:}} \tau, \Delta' \vdash c \gets p \, e\_1 \dots e\_k \gets d\_1, \dots, d\_m \mathrel{\mathop{:}} Q :: (d : D)} & \mathsf{pdef} \end{array}$$

After evaluating the value arguments, the call consumes the channels d<sup>j</sup> (which will not be available to the continuation Q, due to linearity). The continuation Q will then be the (sole) client of c and The new process providing c will execute [c/x][d1/y1] ... [dm/ym]P.

One more quick shorthand used in the examples: a tail-call <sup>c</sup> <sup>←</sup> <sup>p</sup> <sup>e</sup> <sup>←</sup> <sup>d</sup> in the definition of a process that provides along <sup>c</sup> is expanded into <sup>c</sup> <sup>←</sup> <sup>p</sup> <sup>e</sup> <sup>←</sup> <sup>d</sup> ; <sup>c</sup> <sup>←</sup> <sup>c</sup> for a fresh <sup>c</sup> . Depending on how forwarding is implemented, however, it may be much more efficient [13].

**Stopping Computation.** Finally, in order to be able to successfully monitor computation, we need the capability to stop the computation. We add an abort l construct that aborts on a particular label. We also add assert blocks to check conditions on observable values. The semantics are given below and the typing is in Fig. 6.

$$\mathsf{procc}(c, \texttt{assert} \ l \ \mathsf{True}; Q) \longrightarrow \mathsf{procc}(c, Q) \qquad \mathsf{procc}(c, \texttt{assert} \ l \ \mathsf{False}; Q) \longrightarrow \mathsf{abort}(l)$$

Progress and preservation were proven for the above system, with the exception of the abort and assert rules, in prior work [18]. The additional proof cases do not change the proof significantly.

### **3 Contract Examples**

In this section, we present monitoring processes that can enforce a variety of contracts. The examples will mainly use lists as defined in the previous section. Our monitors are transparent, that is, they do not change the computation. We accomplish this by making them act as partial identities (described in more detail in Sect. 4). Therefore, any monitor that enforces a contract on a list must peel off each layer of the type one step at a time (by sending or receiving over the channel as dictated by the type), perform the required checks on values or labels, and then reconstruct the original type (again, by sending or receiving as appropriate).

**Refinement.** The simplest kind of monitoring process we can write is one that models a refinement of an integer type; for example, a process that checks whether every element in the list is positive. This is a recursive process that receives the head of the list from channel b, checks whether it is positive (if yes, it continues to the next value, if not it aborts), and then sends the value along to reconstruct the monitored list a. We show three refinement monitors in Fig. 1. The process pos implements the refinement mentioned above.

```
pos : {list ← list}
a ← pos mon ← b =
  case b of
  | nil ⇒ a.nil ; wait b ; close a
  | cons ⇒ x ← recv b ;
    assert(x > 0)
                  ρ ;
    a.cons ; send a x ;
    a ← pos mon ← b; ;
                                      empty : {list ← list}
                                      a ← empty ← b =
                                        case b of
                                        | nil ⇒ wait b ;
                                         a.nil ; close a
                                        | cons ⇒ abortρ; ;
                                                                nempty : {list ← list}
                                                                a ← nempty ← b =
                                                                  case b of
                                                                  | nil ⇒ abortρ
                                                                  | cons ⇒ a.cons ;
                                                                   x ← recv b ;
                                                                   send a x ; a ← b; ;
```
**Fig. 1.** Refinement examples

Our monitors can also exploit information that is contained in the labels in the external and internal choices. The empty process checks whether the list <sup>b</sup> is empty and aborts if <sup>b</sup> sends the label cons. Similarly, the nempty monitor checks whether the list <sup>b</sup> is not empty and aborts if <sup>b</sup> sends the label nil. These two monitors can then be used by a process that zips two lists and aborts if they are of different lengths. These two monitors enforce the refinements {nil} ⊆ {nil, cons} and {cons}⊆{nil, cons}. We discuss how to generate monitors from refinement types in more detail in Sect. 5.

**Monitors with Internal State.** We now move beyond refinement contracts, and model contracts that have to maintain some internal state (Fig. 2).

We first present a monitor that checks whether the given list is sorted in ascending order (ascending). The monitor's state consists of a lower bound on the subsequent elements in the list. This value has an option type, which can either be None if no bound has yet been set, or Some b if <sup>b</sup> is the current bound.

If the list is empty, there is no bound to check, so no contract failure can happen. If the list is nonempty, we check to see if a bound has already been set. If not, we set the bound to be the first received element. If there is already a bound in place, then we check if the received element is greater or equal to the bound. If it is not, then the list must be unsorted, so we abort with a contract

```
ascending : option int → {list ← list}; ;
m ← ascending bound ← n =
  case n of
  | nil ⇒ m.nil ; wait n ; close m
  | cons ⇒ x ← recv n ;
   case bound of
    | None ⇒ m.cons ; send m x ;
      m ← ascending (Some x) ← n
    | Some a ⇒ assert (x ≥ a)
                              ρ ;
      m.cons ; send m x ;
      m ← ascending (Some x) ← n; ;
                                              match : int → {list ← list}; ;
                                              a ← match count ← b =
                                                case b of
                                                | nil ⇒ assert (count = 0)
                                                                            ρ ;
                                                  a.nil ; wait b ; close a
                                                | cons ⇒ a.cons ; x ← recv b ;
                                                  if (x = 1) then send a x ;
                                                     a ← match (count + 1) ← b;
                                                  else if (x = −1)
                                                        then assert(count > 0)
                                                                                ρ ;
                                                        send a x ;
                                                        a ← match (count−1) ← b ;
                                                  else abortρ //invalid input
```
**Fig. 2.** Monitors using internal state

failure. Note that the output list m is the same as the input list n because every element that we examine is then passed along unchanged to m.

We can use the ascending monitor to verify that the output list of a sorting procedure is in sorted order. To take the example one step further, we can verify that the elements in the output list are in fact a permutation of the elements in the input list of the sorting procedure as follows. Using a reasonable hash function, we hash each element as it is sent to the sorting procedure. Our monitor then keeps track of a running total of the sum of the hashes, and as elements are received from the sorting procedure, it computes their hash and subtracts it from the total. After all of the elements are received, we check that the total is 0 – if it is, with high probability, the two lists are permutations of each other. This example is an instance of *result checking*, inspired by Wasserman and Blum [26]. The monitor encoding is straightforward and omitted from the paper.

Our next example match validates whether a set of right and left parentheses match. The monitor can use its internal state to push every left parenthesis it sees on its stack and to pop it off when it sees a right parenthesis. For brevity, we model our list of parentheses by marking every left parenthesis with a 1 and right parenthesis with a -1. So the sequence ()()) would look like 1, <sup>−</sup>1, <sup>1</sup>, <sup>−</sup>1, <sup>−</sup>1. As we can see, this is not a proper sequence of parenthesis because adding all of the integer representations does not yield 0. In a similar vein, we can implement a process that checks that a tree is serialized correctly, which is related to recent work on context-free session types by Thiemann and Vasconcelos [21].

**Mapper.** Finally, we can also define monitors that check higher-order contracts, such as a contract for a mapping function (Fig. 3). Consider the mapper which takes an integer and doubles it, and a function map that applies this mapper to a list of integers to produce a new list of integers. We can see that any integer that the mapper has produced will be strictly larger than the original integer, assuming the original integer is positive. In order to monitor this contract, it makes sense to impose a contract on the mapper itself. This mapper mon process enforces both the precondition, that the original integer is positive, and the

```
mapper tp : { {done : 1 ; next : ∀n : int.∃n : int.mapper tp}}
m ← mapper =
  case m of
  | done ⇒ close m
  | next ⇒ x ← recv m ; send m (2 ∗ x) ; m ← mapper
map : {list ← mapper tp ; list}
k ← map ← m l =
  case l of
  | nil ⇒ m.done ; k.nil ; wait l ; close k
  | cons ⇒ m-
             ← mapper mon ← m; //run monitor
    x ← recv l ; send m-
                         x ; y ← recv m-
                                        ; k.cons ; send k y ; k ← map m-
                                                                       l; ;
mapper mon : {mapper tp ← mapper tp}
n ← mapper mon ← m =
  case n of
  | done ⇒ m.done ; wait m ; close n
  | next ⇒ x ← recv n ; assert(x > 0)
                                      ρ1 //checks precondition
    m.next ; send m x ; y ← recv m ; assert(y > x)
                                                 ρ2 //checks postcondition
    send n y ; n ← mapper mon ← m
```
**Fig. 3.** Higher-order monitor

postcondition, that the resulting integer is greater than the original. We can now run the monitor on the mapper, in the map process, before applying the mapper to the list l.

### **4 Monitors as Partial Identity Processes**

In the literature on contracts, they are often depicted as guards on values sent to and returned from functions. In our case, they really *are* processes that monitor message-passing communications between processes. For us, a central property of contracts is that a program may be executed with or without contract checking and, unless an alarm is raised, the observable outcome should be the same. This means that contract monitors should be *partial identity processes* passing messages back and forth along channels while testing properties of the messages.

This may seem very limiting at first, but session-typed processes can maintain local state. For example, consider the functional notion of a *dependent contract*, where the contract on the result of a function depends on its input. Here, a function would be implemented by a process to which you send the arguments and which sends back the return value *along the same channel*. Therefore, a monitor can remember any (non-linear) "argument values" and use them to validate the "result value". Similarly, when a list is sent element by element, properties that can be easily checked include constraints on its length, or whether it is in ascending order. Moreover, local state can include additional (private) concurrent processes.

This raises a second question: how can we guarantee that a monitor really is a partial identity? The criterion should be general enough to allow us to naturally express the contracts from a wide range of examples. A key constraint is that *contracts are expressed as session-typed processes*, just like functional contracts should be expressed within the functional language, or object contracts within the object oriented language, etc.

The purpose of this section is to present and prove the correctness of a criterion on session-typed processes that guarantees that they are observationally equivalent to partial identity processes. All the contracts in this paper can be verified to be partial identities under our definition.

### **4.1 Buffering Values**

As a first simple example let's take a process that receives one positive integer <sup>n</sup> and factors it into two integers <sup>p</sup> and <sup>q</sup> that are sent back where <sup>p</sup> <sup>≤</sup> <sup>q</sup>. The part of the specification that is *not* enforced is that if n is not prime, p and q should be proper factors, but we at least enforce that all numbers are positive and <sup>n</sup> <sup>=</sup> <sup>p</sup> <sup>∗</sup> <sup>q</sup>. We are being very particular here, for the purpose of exposition, marking the place where the direction of communication changes with a shift (↑). Since a minimal number of shifts can be inferred during elaboration of the syntax [18], we suppress it in most examples.

factor <sup>t</sup> <sup>=</sup> <sup>∀</sup>n:int. ↑ ∃p:int. <sup>∃</sup>q:int. **<sup>1</sup>** factor monitor : {factor t ← factor t} <sup>c</sup> <sup>←</sup> factor monitor <sup>←</sup> <sup>d</sup> <sup>=</sup> <sup>n</sup> <sup>←</sup> recv <sup>c</sup> ; assert (n > 0)<sup>ρ</sup><sup>1</sup> ; shift <sup>←</sup> recv <sup>c</sup> ; send d n ; send <sup>d</sup> shift ; <sup>p</sup> <sup>←</sup> recv <sup>d</sup> ; assert(p > 0)<sup>ρ</sup><sup>2</sup> ; <sup>q</sup> <sup>←</sup> recv <sup>d</sup> ; assert(q > 0)<sup>ρ</sup><sup>3</sup> ; assert(<sup>p</sup> <sup>≤</sup> <sup>q</sup>)<sup>ρ</sup><sup>4</sup> ; assert(<sup>n</sup> <sup>=</sup> <sup>p</sup> <sup>∗</sup> <sup>q</sup>)<sup>ρ</sup><sup>5</sup> ; send c p ; send c q ; <sup>c</sup> <sup>←</sup> <sup>d</sup>

This is a one-time interaction (the session type factor t is not recursive), so the monitor terminates. It terminates here by forwarding, but we could equally well have replaced it by its identity-expanded version at type **1**, which is wait d ; close c.

The contract could be invoked by the provider or by the client. Let's consider how a provider factor might invoke it:

$$\text{factor} : \{\text{factor\\_t}\} \\ c \leftarrow \text{factor} = c' \leftarrow \text{factor\\_raw} ; c' \leftarrow \text{factor\\_monitor} \leftarrow c' ; c \leftarrow c'$$

To check that factor monitor is a partial identity we need to track that p and q are received from the provider, in this order. In general, for any received message, we need to enter it into a message queue q and we need to check that the messages are passed on in the correct order. As a first cut (to be generalized several times), we write for negative types:

$$(q](b:B^-) \;;\;\Psi \vdash P :: (a:A^-)$$

which expresses that the two endpoints of the monitor are a : A<sup>−</sup> and b : B<sup>−</sup> (both negative), and we have already received the messages in q along a. The context Ψ declares types for local variables.

A monitor, at the top level, is defined with

$$\begin{array}{l} mon : \tau\_1 \to \dots \to \tau\_n \to \{A \gets A\} \\ a \gets mon \, x\_1 \dots x\_n \gets b = P \end{array}$$

where context Ψ declares value variables x. The body P here is type-checked as one of (depending on the polarity of A)

$$[[\ ](b:A^{-})\ ;\Psi\vdash P\;::\ (a:A^{-})\quad\text{or}\quad(b:A^{+})\ ;\Psi\vdash P\;::\ [\ ](a:A^{+})\ \_$$

where <sup>Ψ</sup> = (x1:τ1)···(xn:τn). A use such as

$$c \gets mon\ e\_1 \ldots e\_n \gets c$$

is transformed into

<sup>c</sup> <sup>←</sup> *mon* <sup>e</sup><sup>1</sup> ...e<sup>n</sup> <sup>←</sup> <sup>c</sup> ; <sup>c</sup> <sup>←</sup> <sup>c</sup>

for a fresh c and type-checked accordingly.

In general, queues have the form <sup>q</sup> <sup>=</sup> <sup>m</sup><sup>1</sup> ··· <sup>m</sup><sup>n</sup> with


where m<sup>1</sup> is the front of the queue and m<sup>n</sup> the back.

When a process P receives a message, we add it to the end of the queue q. We also need to add it to Ψ context, marked as *unrestricted* (non-linear) to remember its type. In our example τ = int.

$$\frac{[q\cdot n](b:B):\Psi, n . \tau \vdash P :: (a:A^{-})}{[q](b:B):\Psi \vdash n \leftarrow \text{recv}\ a:P :: (a:\forall n . \tau . A^{-})} \; \forall R$$

Conversely, when we *send* along b the message must be equal to the one at the front of the queue (and therefore it must be a variable). The m is a value variable and remains in the context so it can be reused for later assertion checks. However, it could never be sent again since it has been removed from the queue.

$$\frac{[q](b:[m/n]B):\Psi,m:\vdash P::(a:A)}{[m\cdot q](b:\forall n:\tau.B):\Psi,m:\vdash \mathsf{send\\_b}{\mathsf{send\\_}}\,b\,m:\,Q::(a:A)}\,\forall L$$

All the other send and receive rules for negative types (∀, -, -) follow exactly the same pattern. For positive types, a queue must be associated with the channel along which the monitor provides (the succedent of the sequent judgment).

$$(b:B^+) \;:\; \Psi \vdash Q :: [q](a:A^+)$$

Moreover, when end has been received along b the corresponding process has terminated and the channel is closed, so we generalize the judgment to

$$
\omega \; ; \; \Psi \vdash Q \; : \; [q](a : A^{+}) \qquad \text{with } \omega = \cdot \mid (b : B) .
$$

The shift messages change the direction of communication. They therefore need to switch between the two judgments and also ensure that the queue has been emptied before we switch direction. Here are the two rules for ↑, which appears in our simple example:

$$\frac{[q \cdot \text{shift}](b : B^-) \; ; \Psi \vdash P :: (a : A^+) }{[q](b : B^-) \; ; \Psi \vdash \text{shift} \leftarrow \text{recv } a \; ; \; P :: (a : \uparrow A^+) }$$

We notice that after receiving a shift, the channel a already changes polarity (we now have to send along it), so we generalize the judgment, allowing the succedent to be either positive or negative. And conversely for the other judgment.

$$\begin{array}{l} [q](b:B^{-}) \mathrel{\mathop{:}} \Psi \vdash P :: (a:A) \\ \omega \mathrel{\mathop{:}} \Psi \vdash Q :: [q](a:A^{+}) \end{array} \text{ where } \omega = \cdot \mid (b:B) $$

When we *send* the final shift, we initialize a new empty queue. Because the queue is empty the two sides of the monitor must have the same type.

$$\frac{(b:B^+) \; ; \Psi \vdash Q :: [ \ ](a:B^+)}{[\text{shift}](b:\uparrow B^+) \; ; \Psi \vdash \text{send } b \text{ shift} \; ; \ Q :: (a:B^+) }\; \top L$$

The rules for forwarding are also straightforward. Both sides need to have the same type, and the queue must be empty. As a consequence, the immediate forward is always a valid monitor at a given type.

$$\overline{(b:A^+)} \; ; \Psi \vdash a \gets b :: [ \; ](a:A^+) \; \stackrel{\text{id}^+}{\quad} \quad \overline{[ \; ](b:A^-)} \; ; \Psi \vdash a \gets b :: (a:A^-) \; \stackrel{\text{id}^-}{\quad}$$

#### **4.2 Rule Summary**

The current rules allow us to communicate *only along the channels* a *and* b *that are being monitored*. If we send channels along channels, however, these channels must be recorded in the typing judgment, but we are not allowed to communicate along them directly. On the other hand, if we spawn internal (local) channels, say, as auxiliary data structures, we should be able to interact with them since such interactions are not externally observable. Our judgment thus requires two additional contexts: Δ for channels internal to the monitor, and Γ for externally visible channels that may be sent along the monitored channels. Our full judgments therefore are

$$\begin{array}{l} [q](b:B^{-}) \mathrel{\mathop{:}} \Psi \; ; \; \Gamma \; ; \; \Delta \vdash P :: (a:A) \\ \omega \; ; \Psi \; ; \; \Gamma \; ; \; \Delta \vdash Q :: [q](a:A^{+}) \end{array}$$

So far, it is given by the following rules

$$\begin{array}{c} \left( \forall \ell \in L \right) \quad (b:B\_{\ell}) : \Psi \; ; \; \Gamma \; ; \; \Delta \vdash Q\_{\ell} :: [q \cdot \ell] (a:A^{+}) \\ \hline \left( b:\oplus \{ \ell : B\_{\ell} \}\_{\ell \in L} \right) \; ; \Psi \; ; \; \Gamma \; ; \; \Delta \vdash \mathsf{case} \; b \; (\ell \Rightarrow Q\_{\ell})\_{\ell \in L} :: [q] (a:A^{+}) \quad \oplus L \\ \quad \omega \; ; \; \Psi \; ; \; \Gamma \; ; \; \Delta \vdash P :: [q] (a:B\_{k}) \quad (k \in L) \\ \hline \omega \; ; \; \Psi \; ; \; \Gamma \; ; \; \Delta \vdash a .k : P :: [k \cdot q] (a : \oplus \{ \ell : B\_{\ell} \}\_{\ell \in L}) \end{array} \oplus L$$

(∀ <sup>∈</sup> <sup>L</sup>) [<sup>q</sup> · ](<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P- :: (a : A-) [q](<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> case <sup>a</sup> ( <sup>⇒</sup> <sup>P</sup>-)-<sup>∈</sup><sup>L</sup> :: (<sup>a</sup> : -{ : <sup>A</sup>-}-<sup>∈</sup>L) -R [q](<sup>b</sup> : <sup>B</sup>k) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - <sup>P</sup> :: (<sup>a</sup> : <sup>A</sup>) (<sup>k</sup> <sup>∈</sup> <sup>L</sup>) [<sup>k</sup> · <sup>q</sup>](<sup>b</sup> : ⊕{ : <sup>B</sup>-}-<sup>∈</sup>L) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> b.k ; <sup>P</sup> :: (<sup>a</sup> : <sup>A</sup>) -L (<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; Γ, x:<sup>C</sup> ; <sup>Δ</sup> - <sup>Q</sup> :: [<sup>q</sup> · <sup>x</sup>](<sup>a</sup> : <sup>A</sup>) (<sup>b</sup> : <sup>C</sup> <sup>⊗</sup> <sup>B</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> <sup>x</sup> <sup>←</sup> recv <sup>b</sup> ; <sup>Q</sup> :: [q](<sup>a</sup> : <sup>A</sup>) <sup>⊗</sup><sup>L</sup> <sup>ω</sup> ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P :: [q](a : A) <sup>ω</sup> ; <sup>Ψ</sup> ; Γ, x:<sup>C</sup> ; <sup>Δ</sup> send a x ; <sup>P</sup> :: [<sup>x</sup> · <sup>q</sup>](<sup>a</sup> : <sup>C</sup> <sup>⊗</sup> <sup>A</sup>) <sup>⊗</sup><sup>R</sup> [<sup>q</sup> · <sup>x</sup>](<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; Γ, x:<sup>C</sup> ; <sup>Δ</sup> - P :: (a : A) [q](<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> <sup>x</sup> <sup>←</sup> recv <sup>a</sup> ; <sup>P</sup> :: (<sup>a</sup> : <sup>C</sup> - <sup>A</sup>) -R [q](<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - Q :: (a : A) [<sup>x</sup> · <sup>q</sup>](<sup>b</sup> : <sup>C</sup> - <sup>B</sup>) ; <sup>Ψ</sup> ; Γ, x:<sup>C</sup> ; <sup>Δ</sup> send b x ; <sup>Q</sup> :: (<sup>a</sup> : <sup>A</sup>) -L · ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - <sup>Q</sup> :: [<sup>q</sup> · end](<sup>a</sup> : <sup>A</sup>) (<sup>b</sup> : **<sup>1</sup>**) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> wait <sup>b</sup> ; <sup>Q</sup> :: [q](<sup>a</sup> : <sup>A</sup>) **<sup>1</sup>**<sup>L</sup> · ; <sup>Ψ</sup> ; · ; · close <sup>a</sup> :: [end](<sup>a</sup> : **<sup>1</sup>**) **<sup>1</sup>**<sup>R</sup> (<sup>b</sup> : <sup>B</sup>) ; Ψ,n:<sup>τ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - <sup>Q</sup> :: [<sup>q</sup> · <sup>n</sup>](<sup>a</sup> : <sup>A</sup>) (<sup>b</sup> : <sup>∃</sup>n:τ.B) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> <sup>n</sup> <sup>←</sup> recv <sup>b</sup> ; <sup>Q</sup> :: [q](<sup>a</sup> : <sup>A</sup>) <sup>∃</sup><sup>L</sup> <sup>ω</sup> ; Ψ,m:<sup>τ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P :: [q](a : [m/n]A) <sup>ω</sup> ; Ψ,m:<sup>τ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> send a m ; <sup>P</sup> :: [<sup>m</sup> · <sup>q</sup>](<sup>a</sup> : <sup>∃</sup>n:τ.A) <sup>∃</sup><sup>R</sup> [<sup>q</sup> · <sup>n</sup>](<sup>b</sup> : <sup>B</sup>) ; Ψ,n:<sup>τ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P :: (a : A−) [q](<sup>b</sup> : <sup>B</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> <sup>v</sup> <sup>←</sup> recv <sup>a</sup> ; <sup>P</sup> :: (<sup>a</sup> : <sup>∀</sup>n:τ.A−) <sup>∀</sup><sup>R</sup> [q](<sup>b</sup> : [m/n]B) ; Ψ,m:<sup>τ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P :: (a : A) [<sup>m</sup> · <sup>q</sup>](<sup>b</sup> : <sup>∀</sup>n:τ.B) ; Ψ,m:<sup>τ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> send b m ; <sup>Q</sup> :: (<sup>a</sup> : <sup>A</sup>) <sup>∀</sup><sup>L</sup> (<sup>b</sup> : <sup>B</sup>−) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - <sup>Q</sup> :: [<sup>q</sup> · shift](<sup>a</sup> : <sup>A</sup><sup>+</sup>) (<sup>b</sup> : <sup>↓</sup>B−) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> shift <sup>←</sup> recv <sup>b</sup> ; <sup>Q</sup> :: [q](<sup>a</sup> : <sup>A</sup><sup>+</sup>) <sup>↓</sup><sup>L</sup> [ ](<sup>b</sup> : <sup>A</sup>−) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P :: (a : A−) (<sup>b</sup> : <sup>A</sup>−) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> send <sup>a</sup> shift ; <sup>P</sup> :: [shift](<sup>a</sup> : <sup>↓</sup>A−) <sup>↓</sup><sup>R</sup> [<sup>q</sup> · shift](<sup>b</sup> : <sup>B</sup>−) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - P :: (a : A<sup>+</sup>) [q](<sup>b</sup> : <sup>B</sup>−) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> shift <sup>←</sup> recv <sup>a</sup> ; <sup>P</sup> :: (<sup>a</sup> : <sup>↑</sup>A<sup>+</sup>) <sup>↑</sup><sup>R</sup> (<sup>b</sup> : <sup>B</sup><sup>+</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> - Q :: [ ](a : B<sup>+</sup>) [shift](<sup>b</sup> : <sup>↑</sup>B<sup>+</sup>) ; <sup>Ψ</sup> ; <sup>Γ</sup> ; <sup>Δ</sup> send <sup>b</sup> shift ; <sup>Q</sup> :: (<sup>a</sup> : <sup>B</sup><sup>+</sup>) <sup>↑</sup><sup>L</sup>

#### **4.3 Spawning New Processes**

The most complex part of checking that a process is a valid monitor involves spawning new processes. In order to be able to spawn and use local (private) processes, we have introduced the (so far unused) context Δ that tracks such channels. We use it here only in the following two rules:

$$\begin{array}{c} \Psi \mathrel{\mathop{:} \Delta \vdash P :: (c : C) \quad \omega \mathrel{\mathop{:} } \Psi \mathrel{\mathop{:} } \Gamma \mathrel{\mathop{:} } \Delta' \mathrel{\mathop{:} } \mathcal{Q} :: \mathcal{Q} :: [q] (a : A^{+}) \quad \mathsf{cut}^{+} \\ \hline \omega \mathrel{\mathop{:} } \Psi \mathrel{\mathop{:} } \Gamma \mathrel{\mathop{:} } \Delta, \Delta' \vdash (c : C) \leftarrow P \; ; \, Q :: [q] (a : A^{+}) \quad \mathsf{cut}^{+} \\ \hline \Psi \mathrel{\mathop{:} } \Delta \vdash P :: (c : C) \quad [q] (b : B^{-}) \; ; \Psi \mathrel{\mathop{:} } \Gamma \mathrel{\mathop{:} } \Delta' \mathrel{\mathop{:} } \Delta' \mathrel{\mathop{:} } \mathcal{Q} :: (a : A) \end{array}$$

The second premise (that is, the continuation of the monitor) remains the monitor, while the first premise corresponds to a freshly spawned local progress accessible through channel c. All the ordinary left rules for sending or receiving along channels in Δ are also available for the two monitor validity judgments. By the strong ownership discipline of intuitionistic session types, none of this information can flow out of the monitor.

It is also possible for a single monitor to decompose into two monitors that operate concurrently, in sequence. In that case, the queue q may be split anywhere, as long as the intermediate type has the right polarity. Note that Γ must be chosen to contain all channels in q2, while Γ must contain all channels in q1.

$$\begin{array}{c} \omega \mathrel{\mathop{:} \forall \; ; \; \Gamma \; ; \; \Delta \vdash P \mathrel{\mathop{:} \; \_{[q\_{2}]} (c : C^{+}) \quad (c : C^{+}) \; ; \; \Psi \; ; \; \Gamma' \; ; \; \Delta' \vdash Q \mathrel{\mathop{:} \; \_{[q\_{1}]} (a : A^{+})} \quad \mathsf{cut}^{+}\_{2} \\\hline \omega \mathrel{\mathop{:} \forall \; ; \; \Gamma \; ; \; \Gamma \; ; \; \Delta \vdash c : C^{+} \leftarrow P \mathrel{\mathop{:} \; \_{[q\_{1} \cdot q\_{2}]} (a : A^{+})} \end{array}$$

Why is this correct? The first messages sent along a will be the messages in q1. If we receive messages along c in the meantime, they will be first the messages in q<sup>2</sup> (since P is a monitor), followed by any messages that P may have received along b if ω = (b : B). The second rule is entirely symmetric, with the flow of messages in the opposite direction.

$$\frac{[q\_1](b:B^-) \; ; \forall \; ; \; \Gamma \; ; \; \Delta \vdash P :: (c:C^-) \quad [q\_2](c:C^-) \; ; \forall \; \; ^\prime \; ; \; \Gamma' \; ; \; \Delta' \vdash Q :: (a:A)}{[q\_1 \cdot q\_2](b:B^-) \; ; \forall \; \; ; \; \Gamma \; ; \; \Gamma \; ^\prime \; ; \; \Delta, \Delta' \vdash c : C^- \leftarrow P ; Q :: (a:A)} \; \; \mathsf{cut}\_2^{-}$$

The next two rules allow a monitor to be attached to a channel x that is passed between a and b. The monitored version of x is called x , where x is chosen fresh. This apparently violates our property that we pass on all messages exactly as received, because here we pass on a monitored version of the original. However, if monitors are partial identities, then the original x and the new x are indistinguishable (unless a necessary alarm is raised), which will be a tricky part of the correctness proof.

$$\frac{\begin{subarray}{c}\{x:C^{+}\}\;;\Psi\;:;\;\Delta\vdash P\::\!\mid\left[\left(x':C^{+}\right)\quad\omega\;;\;\Psi\;:\Gamma,x':C^{+}\;\forall\;\neg Q\::\!q\coloneqq[q\_{1}\cdot x'\cdot q\_{2}]\big(a:A^{+}\right)\\\hline\omega\;;\Psi\;:\;\Gamma,x:C^{+}\;\mid\;\Delta\,,\Delta\vdash x'\leftarrow P\;;\;Q\::\!\midq\cdot x\cdot q\_{2}\!\big]\!\mida:A^{+}\end{subarray}}{\begin{subarray}{c}\{\left[\left(x:C^{-}\right)\;\ast\;\/\};\;\omega\;\vdash P\coloneqq\;\;\left[\begin{array}{c}\![q\_{1}\cdot x'\cdot q\_{2}]\big(b:A^{-}\right)\quad\Psi\;\;\mathrel{\mathop{\bf{}}\!}{\cdot}\vdash\!\vdash\!Q\coloneqq\;\;\Delta\!\!\vdash Q\coloneqq\;\;\left[\begin{array}{c}\!\left(x:C^{-}\right)\quad\Psi\;\;\/\}\end{array}\right]\;\!\mid\!\mid\begin{array}{c}\!\left(x:C^{+}\right)\quad\Psi\;\;\/\}\end{array}\end{pmatrix}}\\\begin{subarray}{c}\{\left[\left(x:C^{-}\right)\right];\Psi\;\;\/\}:\;\scriptstyle\mid\;\Delta\!\vdash P\coloneqq\;\;\left[\begin{array}{c}\![q\_{1}\cdot x'\cdot q\_{2}]\big(b:A^{-}\right)\quad\Psi\;\;\;\vdash P\coloneqq\;\;\Delta\!\mid\;\left[\begin{array}{c}\!\left(x:C^{+}\right)\quad\Psi\;\;\/\}\end{array}\right]\;\!\mid\;\vdash Q\coloneqq\;\;\mid\;\Delta\!\mid\begin{array}{c}\!\left(x:C^{+}\right)\quad\Psi\;\;\mid\;\vdash P\coloneqq\;\;\mid\;\vdash Q\coloneqq\;\;\vdash P\coloneqq\;\;\mid\;\vdash\;\vdash A\middle{$$

There are two more versions of these rules, depending on whether the types of x and the monitored types are positive or negative. These rules play a critical role in monitoring higher-order processes, because monitoring c : A<sup>+</sup> - B<sup>−</sup> may require us to monitor the continuation c : B<sup>−</sup> (already covered) but also communication along the channel x : A<sup>+</sup> received along c.

In actual programs, we mostly use cut <sup>x</sup> <sup>←</sup> <sup>P</sup> ; <sup>Q</sup> in the form <sup>x</sup> <sup>←</sup> <sup>p</sup> <sup>e</sup> <sup>←</sup> <sup>d</sup> ; <sup>Q</sup> where p is a defined process. The rules are completely analogous, except that for those rules that require splitting a context in the conclusion, the arguments d will provide the split for us. When a new sub-monitor is invoked in this way, we remember and eventually check that the process p must also be a partial identity process, unless we are already checking it. This has the effect that recursively defined monitors with proper recursive calls are in fact allowed. This is important, because monitors for recursive types usually have a recursive structure. An illustration of this can be seen in pos in Fig. 1.

### **4.4 Transparency**

We need to show that monitors are *transparent*, that is, they are indeed observationally equivalent to partial identity processes. Because of the richness of types and process expressions and the generality of the monitors allowed, the proof has some complexities. First, we define the configuration typing, which consists of just three rules. Because we also send and receive ordinary values, we also need to type (closed) substitutions σ = (v1/n1,...,vk/nk) using the judgment σ :: Ψ.

$$\begin{array}{cccc}\hline\hline(\cdot)\stackrel{\textstyle\rightarrow}{\mathrel{\vbox{\hbox{\$::\$}}{(\cdot)}}\nolimits\mbox{\$\vbox{\hbox{\$::\$}}{(\cdot)}\$}}\end{array}\quad\begin{array}{cccc}\hline\hline\vdash\upsilon\mathrel{\vbox{\hbox{\$::\$}}{\mathrel{\hbox{\$::\$}}{(\cdot)}}\nolimits\mbox{\$\vbox{\hbox{\$::\$}}{(\cdot)}\$}\mbox{\$\vbox{\hbox{\$::\$}}{(\cdot)}\nolimits\mbox{\$\hbox{\$::\$}}{(\cdot)}\\\hline\hline(\sigma\_{1},\sigma\_{2})\stackrel{\textstyle\rightarrow}{\mathrel{\hbox{\$::\$}}{(\cdot)}}\nolimits\mbox{\$\hbox{\$::\$}{(\Psi\_{1},\Psi\_{2})}\end{array}$$

For configurations, we use the judgment

$$
\Delta \vdash \mathcal{C} :: \Delta'
$$

which expresses that process configuration <sup>C</sup> *uses* the channels in <sup>Δ</sup> and *provides* the channels in Δ . Channels that are neither used nor offered by C are "passed through". Messages are just a restricted form of processes, so they are typed exactly the same way. We write *pred* for either proc or msg.

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta\_{0} \vdash \mathcal{C}\_{1} :: \Delta\_{1} \quad \Delta\_{1} \vdash \mathcal{C}\_{2} :: \Delta\_{2} \\ \hline \Delta\_{0} \vdash \mathcal{C}\_{1}, \mathcal{C}\_{2} :: \Delta\_{2} \end{array} \end{array} \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta\_{0} \vdash \mathcal{C}\_{1} :: \Delta\_{1} \quad \Delta\_{1} \vdash \mathcal{C}\_{2} :: \Delta\_{2} \\ \hline \Delta\_{0} \vdash \mathcal{C}\_{1}, \mathcal{C}\_{2} :: \Delta\_{2} \end{array} \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \Psi \; \;; : \, \Delta \vdash P :: (c : A) \quad \sigma : \Psi\\ \hline \Delta', \, \Delta[\sigma] \vdash \mid \sigma \, (\Delta', c :: A[\sigma]) \quad \mid \, \sigma \vdash \mid \, \Box \vdash \mathcal{C}\_{1} \, \mid \, \Box \vdash \mathcal{C}\_{2} \, \mid \, \Box \vdash \mathcal{C}\_{1} \, \mid \, \Box \vdash \mathcal{C}\_{2} \, \mid \, \Box \vdash \mathcal{C}\_{1} \, \mid \, \Box \vdash \mathcal{C}\_{2} \, \mid \, \Box \vdash \mathcal{C}\_{1} \, \mid \, \Box \vdash \mathcal{C}\_{2} \, \mid \, \Box \vdash \mathcal{C}\_{1} \, \mid \, \Box \vdash \mathcal{C}\_{2} \, \mid \, \Box \vdash \mathcal{C}\_{1} \, \mid \, \Box \vdash \mathcal{C}\_{2} \, \mid \, \Box \vdash \mathcal{C}\_{1}$$

To characterize observational equivalence of processes, we need to first characterize the possible messages and the direction in which they flow: towards the client (channel type is positive) or towards the provider (channel type is negative). We summarize these in the following table. In each case, c is the channel along with the message is transmitted, and c is the continuation channel.


The notion of observational equivalence we need does not observe "nontermination", that is, it only compares messages that are actually received. Since messages can flow in two directions, we need to observe messages that arrive at either end. We therefore do *not* require, as is typical for bisimulation, that if one configuration takes a step, another configuration can also take a step. Instead we say if both configurations send an externally visible message, then the messages must be equivalent.

Supposing <sup>Γ</sup> -<sup>C</sup> : <sup>Δ</sup> and <sup>Γ</sup> -<sup>D</sup> :: <sup>Δ</sup>, we write <sup>Γ</sup> -C∼D :: <sup>Δ</sup> for our notion of observational equivalence. It is the largest relation satisfying that <sup>Γ</sup> -C∼D : <sup>Δ</sup> implies


Clauses (1) and (2) correspond to absorbing a message into a configuration, which may later be received by a process according to clauses (5) and (6).

Clauses (3) and (4) correspond to observing messages, either by a client (clause (3)) or provider (clause (4)).

In clause (3) we take advantage of the property that a new continuation channel in the message P (one that does not appear already in Γ) is always chosen fresh when created, so we can consistently (and silently) rename it in C , Δ <sup>1</sup>, and <sup>P</sup> (and <sup>D</sup> , Δ <sup>2</sup>, and Q, respectively). This slight of hand allows us to match up the context and messages exactly. An analogous remark applies to clause (4). A more formal description would match up the contexts and messages modulo two renaming substitution which allow us to leave Γ and Δ fixed.

Clauses (5) and (6) make sense because a transition never changes the interface to a configuration, except when executing a forwarding proc(a, a <sup>←</sup> <sup>b</sup>) which substitutes b for a in the remaining configuration. We can absorb this renaming into the renaming substitution. Cut creates a new channel, which remains internal since it is linear and will have one provider and one client within the new configuration. Unfortunately, our notation is already somewhat unwieldy and carrying additional renaming substitutions further obscures matters. We therefore omit them in this presentation.

We now need to define a relation ∼<sup>M</sup> such that (a) it satisfies the closure conditions of ∼ and is therefore an observational equivalence, and (b) allows us to conclude that monitors satisfying our judgment are partial identities. Unfortunately, the theorem is rather complex, so we will walk the reader through a sequence of generalizations that account for various phenomena.

*The* <sup>⊕</sup>, - *Fragment.* For this fragment, we have no value variables, nor are we passing channels. Then the top-level properties we would like to show are

$$\begin{array}{l} \{1^+\} \text{ If } (y:A^+) \mathrel{\cdot\cdot} ; \cdot \vdash P :: (x:A^+) [\ ] \\ \text{then } y:A^+ \vdash \mathsf{proc}(x, x \gets y) \sim\_M P :: (x:A^+) \\ \{1^-\} \text{ If } [] (y:A^-) \mathrel{\cdot\cdot} ; \cdot \vdash P :: (x:A^-) \\ \text{then } y:A^- \vdash \mathsf{proc}(x, x \gets y) \sim\_M P :: (x:A^-) \end{array}$$

Of course, asserting that proc(x, x <sup>←</sup> <sup>y</sup>) <sup>∼</sup><sup>M</sup> <sup>P</sup> will be insufficient, because this relation is not closed under the conditions of observational equivalence. For example, if we add a message along y to both sides, P will change its state once it receives the message, and the queue will record that this message still has to be sent. To generalize this, we need to define the queue that corresponds to a sequence of messages. First, a single message:


We extend this to message sequences with = (·) and E1, <sup>E</sup>2 <sup>=</sup> E1· E2, provided <sup>Δ</sup><sup>0</sup> -<sup>E</sup><sup>1</sup> : <sup>Δ</sup><sup>1</sup> and <sup>Δ</sup><sup>1</sup> -<sup>E</sup><sup>2</sup> :: <sup>Δ</sup>2.

Then we build into the relation that sequences of messages correspond to the queue.

$$\begin{array}{l} \{2^+\} \text{ If } (y:B^+) \text{ : } \cdot ; \cdot ; \cdot ; \vdash P :: (x:A^+) [\langle\!\langle\mathcal{E}\rangle\!\rangle] \text{ then } y:B^+ \vdash \mathcal{E} \sim\_M \mathsf{proc}(x,P) :: \langle\!\langle x:A^+\rangle. \\ (x:A^+). \\ \{2^-\} \text{ If } [\langle\!\langle\mathcal{E}\rangle\!\rangle] \langle y:B^-\rangle \text{ : } \cdot ; \cdot ; \cdot \vdash P :: (x:A^-) \text{ then } y:B^- \vdash \mathcal{E} \sim\_M \mathsf{proc}(x,P) :: \langle\!\langle x:A^-\rangle. \\ \end{array}$$

When we add shifts the two propositions become mutually dependent, but otherwise they remain the same since the definition of E is already general enough. But we need to generalize the type on the opposite side of queue to be either positive or negative, because it switches polarity after a shift has been received. Similarly, the channel might terminate when receiving **1**, so we also need to allow ω, which is either empty or of the form y : B.

$$\begin{array}{l} \{3^{+}\} \text{ If } \omega \text{ : } \cdot ; \cdot ; \cdot \vdash P :: (x:A^{+})[\langle\langle\mathcal{E}\rangle\rangle] \text{ then } \omega \vdash \mathcal{E} \sim\_{M} \mathsf{proc}(x,P) :: (x:A^{+}).\\ \{3^{-}\} \text{ If } [\langle\langle\mathcal{E}\rangle\rangle](y:B^{-}) \text{ : } \cdot ; \cdot ; \cdot \vdash P :: (x:A) \text{ then } y:B^{-} \vdash \mathcal{E} \sim\_{M} \mathsf{proc}(x,P) :: \\ (x:xA). \end{array}$$

Next, we can permit local state in the monitor (rules cut<sup>+</sup> <sup>1</sup> and cut<sup>−</sup> <sup>1</sup> ). The fact that neither of the two critical endpoints y and x, nor any (non-local) channel,s can appear in the typing of the local process is key. That local process will evolve to a local configuration, but its interface will not change and it cannot access externally visible channels. So we generalize to allow a configuration D that does not use any channels, and any channels it offers are used by P.

$$\begin{array}{lcl} \{\mathsf{4}^{+}\} \text{ If } \omega \text{ : } \cdot \text{ : } \cdot \text{ : } \Delta \vdash P :: [\langle \mathcal{E} \rangle] (x : A^{+}) \text{ and } \cdot \vdash \mathcal{D} :: \Box \text{ then } \omega \vdash \mathcal{E} \sim\_{M} \\ \mathcal{D}, \mathsf{proc}(x, P) :: [q](x : A^{+}). \\ \{\mathsf{4}^{-}\} \text{ If } [\langle \mathcal{E} \rangle] (y : B^{-}) \text{ : } \cdot \text{ : } \cdot \text{ : } \Delta \vdash P :: (x : A) \text{ and } \cdot \vdash \mathcal{D} :: \Delta \text{ then } \Gamma, y : B^{-} \vdash \mathcal{E} \sim\_{M} \mathsf{D}, \mathsf{proc}(x, P) :: (x : A). \end{array}$$

Next, we can allow value variables necessitated by the universal and existential quantifiers. Since they are potentially dependent, we need to apply the closing substitution σ to a number of components in our relation.

**(**5<sup>+</sup>**)** If <sup>ω</sup> ; <sup>Ψ</sup> ; · ; <sup>Δ</sup> - <sup>P</sup> :: [q](<sup>x</sup> : <sup>A</sup><sup>+</sup>) and <sup>σ</sup> : <sup>Ψ</sup> and <sup>q</sup>[σ] = E and ·-<sup>D</sup> :: <sup>Δ</sup>[σ] then <sup>ω</sup>[σ] -E∼<sup>M</sup> <sup>D</sup>, proc(x, P[σ]) :: (<sup>x</sup> : <sup>A</sup><sup>+</sup>[σ]). **(**5−**)** If [q](<sup>y</sup> : <sup>B</sup>−) ; <sup>Ψ</sup> ; · ; <sup>Δ</sup> - <sup>P</sup> :: (<sup>x</sup> : <sup>A</sup>) and <sup>σ</sup> : <sup>Ψ</sup> and <sup>q</sup>[σ] = <sup>E</sup> and ·-<sup>D</sup> :: <sup>Δ</sup>[σ] then <sup>y</sup> : <sup>B</sup>−[σ] -E∼<sup>M</sup> <sup>D</sup>, proc(x, P[σ]) :: (<sup>x</sup> : <sup>A</sup>[σ]).

Breaking up the queue by spawning a sequence of monitors (rule cut<sup>+</sup> <sup>2</sup> and cut<sup>−</sup> 2 ) just comes down to the compositionally of the partial identity property. This is a new and separate way that two configurations might be in the ∼<sup>M</sup> relation, rather than a replacement of a previous definition.

**(**6**)** If <sup>ω</sup> -<sup>E</sup><sup>1</sup> <sup>∼</sup><sup>M</sup> <sup>D</sup><sup>1</sup> :: (<sup>z</sup> : <sup>C</sup>) and (<sup>z</sup> : <sup>C</sup>) -<sup>E</sup><sup>2</sup> <sup>∼</sup><sup>M</sup> <sup>D</sup><sup>2</sup> :: (<sup>x</sup> : <sup>A</sup>) then ω -(E1, <sup>E</sup>2) <sup>∼</sup><sup>M</sup> (D1, <sup>D</sup>2) :: (<sup>x</sup> : <sup>A</sup>).

At this point, the only types that have not yet accounted for are ⊗ and -. If these channels were only "passed through" (without the four cut<sup>3</sup> rules), this would be rather straightforward. However, for higher-order channel-passing programs, a monitor must be able to spawn a monitor on a channel that it receives before sending on the monitored version. First, we generalize properties (5) to allow the context Γ of channels that may occur in the queue q and the process P, but that P may not interact with.

$$\begin{array}{l} \{\text{7}^{+}\} \text{ If } \omega \text{ } ; \Psi \text{ } ; \Gamma \text{ } ; \Delta \vdash P :: [q] (x : A^{+}) \text{ and } \sigma \text{ } : \Psi \text{ and } q[\sigma] = \langle \mathcal{E} \rangle \text{) and} \\ \cdot \vdash \mathcal{D} :: \Delta [\sigma] \text{ then } \Gamma[\sigma] \text{, } \omega[\sigma] \vdash \mathcal{E} \sim\_{M} \mathcal{D} , \mathsf{proc}(x, P[\sigma]) :: (x : A^{+}[\sigma]). \\ \text{(7}^{-}\) \text{ If } [q] (y : B^{-}) \text{ } ; \Psi \text{ } ; \Gamma \text{ } ; \Delta \vdash P :: (x : A) \text{ and } \sigma \text{ } : \Psi \text{ and } q[\sigma] = \mathcal{E} \text{ and } \\ \cdot \vdash \mathcal{D} :: \Delta [\sigma] \text{ then } \Gamma[\sigma] \text{, } y : B^{-}[\sigma] \vdash \mathcal{E} \sim\_{M} \mathcal{D} , \mathsf{proc}(x, P[\sigma]) :: (x : A[\sigma]). \end{array}$$

In addition we need to generalize property (6) into (8) and (9) to allow multiple monitors to run concurrently in a configuration.


At this point we can state the main theorem regarding monitors.

**Theorem 1.** *If* <sup>Γ</sup> -E∼<sup>M</sup> <sup>D</sup> :: <sup>Δ</sup> *according to properties* (7<sup>+</sup>),(7−),(8), and(9) *then* <sup>Γ</sup> -E∼D :: <sup>Δ</sup>*.*

*Proof.* By closure under conditions 1–6 in the definition of ∼.

By applying it as in equations (1<sup>+</sup>) and (1−), generalized to include value variables as in (5<sup>+</sup>) and (5−) we obtain:

**Corollary 1.** *If* [ ](<sup>b</sup> : <sup>A</sup>−) ; <sup>Ψ</sup> - <sup>P</sup> :: (<sup>a</sup> : <sup>A</sup>−) *or* (<sup>b</sup> : <sup>A</sup><sup>+</sup>) ; <sup>Ψ</sup> - P :: [ ](a : A<sup>+</sup>) *then* P *is a partial identity process.*

### **5 Refinements as Contracts**

In this section we show how to check refinement types dynamically using our contracts. We encode refinements as type casts, which allows processes to remain well-typed with respect to the non-refinement type system (Sect. 2). These casts are translated at run time to monitors that validate whether the cast expresses an appropriate refinement. If so, the monitors behave as identity processes; otherwise, they raise an alarm and abort. For refinement contracts, we can prove a safety theorem, analogous to the classic "Well-typed Programs Can't be Blamed" [25], stating that if a monitor enforces a contract that casts from type A to type B, where A is a subtype of B, then this monitor will never raise an alarm.

### **5.1 Syntax and Typing Rules**

We first augment messages and processes to include casts as follows. We write <sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup> to denote a cast from type <sup>B</sup> to type <sup>A</sup>, where <sup>ρ</sup> is a unique label for the cast. The cast for values is written as (<sup>τ</sup> ⇐ <sup>τ</sup> <sup>ρ</sup>). Here, the types <sup>τ</sup> and <sup>τ</sup> are refinement types of the form {n:<sup>t</sup> <sup>|</sup> <sup>b</sup>}, where <sup>b</sup> is a boolean expression that expresses simple properties of the value n.

$$P ::= \dots \mid x \leftarrow \langle \tau \Leftarrow \tau' \rangle^{\rho} \; v \; ; Q \mid a : A \leftarrow \langle A \Leftarrow B \rangle^{\rho} \; b$$

Adding casts to forwarding is expressive enough to encode a more general cast <sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup>P. For instance, the process <sup>x</sup>:<sup>A</sup> ← <sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup><sup>P</sup> ; <sup>Q</sup><sup>x</sup> can be encoded as: <sup>y</sup>:<sup>B</sup> <sup>←</sup> <sup>P</sup>; <sup>x</sup>:<sup>A</sup> ← <sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup> <sup>y</sup> ; <sup>Q</sup>x.

One of the additional rules to type casts is shown below (both rules can be found in Fig. 6). We only allow casts between two types that are compatible with each other (written <sup>A</sup> <sup>∼</sup> <sup>B</sup>), which is co-inductively defined based on the structure of the types (the full definition is omitted from the paper).

$$\frac{A \sim B}{\Psi \; ; \; b : B \vdash a \gets \langle A \Leftarrow B \rangle^{\rho} \; b \; :: (a : A)} \; \text{id.cast}.$$

#### **5.2 Translation to Monitors**

At run time, casts are translated into monitoring processes. A cast <sup>a</sup> ← <sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup> <sup>b</sup> is implemented as a monitor. This monitor ensures that the process that offers a service on channel b behaves according to the prescribed type A. Because of the typing rules, we are assured that channel b must adhere to the type B.

Figure 4 is a summary of all the translation rules, except recursive types. The translation is of the form: [[<sup>A</sup> ⇐ <sup>B</sup>ρ]]a,b <sup>=</sup> <sup>P</sup>, where <sup>A</sup>, <sup>B</sup> are types; the channels a and b are the offering channel and monitoring channel (respectively) for the resulting monitoring process P; and ρ is a label of the monitor (i.e., the contract).

Note that this differs from blame labels for high-order functions, where the monitor carries two labels, one for the argument, and one for the body of the function. Here, the communication between processes is bi-directional. Though the blame is always triggered by processes sending messages to the monitor, our contracts may depend on a set of the values received so far, so it does not make sense to blame one party. Further, in the case of forwarding, the processes at either end of the channel are behaving according to the types (contracts) assigned to them, but the cast may forcefully connect two processes that have incompatible types. In this case, it is unfair to blame either one of the processes. Instead, we raise an alarm of the label of the failed contract.

The translation is defined inductively over the structure of the types. The tensor rule generates a process that first receives a channel (x) from the channel being monitored (b). It then spawns a new monitor (denoted by the @monitor keyword) to monitor channel x, making sure that it behaves as type A1, and passes the new monitor's offering channel y to channel a. Finally, the monitor continues to monitor b to make sure that it behaves as type A2. The lolli rule is similar to the tensor rule, except that the monitor first receives a channel from its offering channel. Similar to the higher-order function case, the argument position is contravariant, so the newly spawned monitor checks that the received channel behaves as type B1. The exists rule generates a process that first receives a value from the channel b, then checks the boolean condition e to validate the contract. The forall rule is similar, except the argument position is contravariant, so the boolean expression e is checked on the offering channel a. The with rule generates a process that checks that all of the external choices promised by the type -{ : <sup>A</sup>-}-<sup>∈</sup><sup>I</sup> are offered by the process being monitored. If a label in the set I is not implemented, then the monitor aborts with the label ρ. The plus rule requires that, for internal choices, the monitor checks that the monitored process only offers choices within the labels in the set ⊕{ : <sup>A</sup>-}-∈I .

For ease of explanation, we omit details for translating casts involving recursive types. Briefly, these casts are translated into recursive processes. For each pair of compatible recursive types A and B, we generate a unique monitor name <sup>f</sup> and record its type <sup>f</sup> : {<sup>A</sup> <sup>←</sup> <sup>B</sup>} in a context <sup>Ψ</sup>. The translation algorithm needs to take additional arguments, including Ψ to generate and invoke the appropriate recursive process when needed. For instance, when generating the monitor process for <sup>f</sup> : {list <sup>←</sup> list}, we follow the rule for translating internal

[[ **1** ⇐ **1** <sup>ρ</sup>]]a,b = wait b; close a [[ A<sup>1</sup> - A<sup>2</sup> ⇐ B<sup>1</sup> - B<sup>2</sup> <sup>ρ</sup>]]a,b = x ← recv a; @monitor y ← [[ B<sup>1</sup> ⇐ A<sup>1</sup> <sup>ρ</sup>]]y,x <sup>←</sup> <sup>x</sup> send b y; [[ A<sup>2</sup> ⇐ B<sup>2</sup> <sup>ρ</sup>]]a,b - [[ A<sup>1</sup> ⊗ A<sup>2</sup> ⇐ B<sup>1</sup> ⊗ B<sup>2</sup> <sup>ρ</sup>]]a,b = x ← recv b; @monitor y ← [[ A<sup>1</sup> ⇐ B<sup>1</sup> <sup>ρ</sup>]]y,x <sup>←</sup> <sup>x</sup> send a y; [[ A<sup>2</sup> ⇐ B<sup>2</sup> <sup>ρ</sup>]]a,b ⊗ [[ ∀{n : τ | e}. A ⇐ ∀{n : τ - | e- }. B <sup>ρ</sup>]]a,b <sup>=</sup> <sup>x</sup> <sup>←</sup> recv <sup>a</sup>; assert ρ e- (x) (send b x; [[ A ⇐ B <sup>ρ</sup>]]a,b) ∀ [[ ∃{n : τ | e}. A ⇐ ∃{n : τ - | e- }. B <sup>ρ</sup>]]a,b <sup>=</sup> <sup>x</sup> <sup>←</sup> recv <sup>b</sup>; assert ρ e(x) (send a x; [[ A ⇐ B <sup>ρ</sup>]]a,b) ∃ ∀, ∈ I ∩ J, a. ; [[ A ⇐ B <sup>ρ</sup>]]a,b <sup>=</sup> <sup>Q</sup> <sup>∀</sup>, <sup>∈</sup> <sup>J</sup> <sup>∧</sup> /<sup>∈</sup> I, Q <sup>=</sup> abort <sup>ρ</sup> [[ ⊕{ : A}∈<sup>I</sup> ⇐ ⊕{ : B}∈<sup>J</sup> <sup>ρ</sup>]]a,b <sup>=</sup> case <sup>b</sup> ( <sup>⇒</sup> <sup>Q</sup>)∈<sup>I</sup> ⊕ ∀, ∈ I ∩ J, b. ; [[ A ⇐ B <sup>ρ</sup>]]a,b <sup>=</sup> <sup>Q</sup> <sup>∀</sup>, <sup>∈</sup> <sup>I</sup> <sup>∧</sup> /<sup>∈</sup> J, Q <sup>=</sup> abort <sup>ρ</sup> [[ { : A}∈<sup>I</sup> ⇐ { : B}∈<sup>J</sup> <sup>ρ</sup>]]a,b <sup>=</sup> case <sup>a</sup> ( <sup>⇒</sup> <sup>Q</sup>)∈<sup>I</sup> [[ ↑A ⇐ ↑B <sup>ρ</sup>]]a,b = shift ← recv b; send a shift ; [[ A ⇐ B <sup>ρ</sup>]]a,b ↑ [[ ↓A ⇐ ↓B <sup>ρ</sup>]]a,b = shift ← recv a; send b shift ; [[ A ⇐ B <sup>ρ</sup>]]a,b ↓

one

**Fig. 4.** Cast translation

choices. For [[list ⇐ list<sup>ρ</sup>]]y,x we apply the cons case in the translation to get @monitor <sup>y</sup> <sup>←</sup> <sup>f</sup> <sup>←</sup> <sup>x</sup>.

#### **5.3 Metatheory**

We prove two formal properties of cast-based monitors: safety and transparency.

Because of the expressiveness of our contracts, a general safety (or blame) theorem is difficult to achieve. However, for cast-based contracts, we can prove that a cast which enforces a subtyping relation, and the corresponding monitor, will not raise an alarm. We first define our subtyping relation in Fig. 5. In addition to the subtyping between refinement types, we also include label subtyping for our session types. A process that offers more external choices can always be used as a process that offers fewer external choices. Similarly, a process that offers fewer internal choices can always be used as a process that offers more internal choices (e.g., non-empty list can be used as a list). The subtyping rules for internal and external choices are drawn from work by Acay and Pfenning [1].

1 ≤ 1 1 A ≤ A- B ≤ B- A ⊗ B ≤ A- ⊗ B- ⊗ A- ≤ A B ≤ B- A - B ≤ A- - B- - A<sup>k</sup> ≤ A- <sup>k</sup> for k ∈ J J ⊆ I ⊕{lab<sup>k</sup> : Ak}<sup>k</sup>∈<sup>J</sup> ≤ ⊕{lab<sup>k</sup> : A- <sup>k</sup>}<sup>k</sup>∈<sup>I</sup> ⊕ A<sup>k</sup> ≤ A- <sup>k</sup> for k ∈ J I ⊆ J &{lab<sup>k</sup> : Ak}<sup>k</sup>∈<sup>J</sup> ≤ &{lab<sup>k</sup> : A- <sup>k</sup>}<sup>k</sup>∈<sup>I</sup> & A ≤ B ↓ A ≤ ↓ B ↓ A ≤ B ↑ A ≤ ↑ B ↑ A ≤ B τ<sup>1</sup> ≤ τ<sup>2</sup> ∃n : τ1.A ≤ ∃n : τ2.B ∃ A ≤ B τ<sup>2</sup> ≤ τ<sup>1</sup> ∀n : τ1.A ≤ ∀n : τ2.B ∀ def(A) ≤ def(B) A ≤ B def ∀v:τ, [v/x]b<sup>1</sup> →<sup>∗</sup> true implies [v/x]b<sup>2</sup> →<sup>∗</sup> true {x:τ | b1}≤{x:τ | b2} refine

#### **Fig. 5.** Subtyping

For recursive types, we directly examine their definitions. Because of these recursive types, our subtyping rules are co-inductively defined.

We prove a safety theorem (i.e., well-typed casts do not raise alarms) via the standard preservation theorem. The key is to show that the monitor process generated from the translation algorithm in Fig. 4 is well-typed under a typing relation which guarantees that no abort state can be reached. We refer to the type system presented thus far in the paper as T, where monitors that may evaluate to abort can be typed. We define a stronger type system S which consists of the rules in T with the exception of the abort rule and we replace the assert rule with the assert strong rule. The new rule for assert, which semantically verifies that the condition b is true using the fact that the refinements are stored in the context Ψ, is shown below. The two type systems are summarized in Fig. 6.

**Theorem 2 (Monitors are well-typed).** *Let* Ψ *be the context containing the type bindings of all recursive processes.*

*1.* <sup>Ψ</sup> ; <sup>b</sup> : <sup>B</sup> -<sup>T</sup> [[<sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup>]]<sup>Ψ</sup> a,b *:: (a : A). 2. If* <sup>B</sup> <sup>≤</sup> <sup>A</sup>*, then* <sup>Ψ</sup> ; <sup>b</sup> : <sup>B</sup> -<sup>S</sup> [[<sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup>]]<sup>Ψ</sup> a,b :: (a : A)*.*

*Proof.* The proof is by induction over the monitor translation rules. For 2, we need to use the sub-typing relation to show that (1) for the internal and external choice cases, no branches that include abort are generated; and (2) for the forall and exists cases, the assert never fails (i.e., the assert strong rule applies).

As a corollary, we can show that when executing in a well-typed context, a monitor process translated from a well-typed cast will never raise an alarm.

**Corollary 2 (Well-typed casts cannot raise alarms).** -<sup>C</sup> :: <sup>b</sup> : <sup>B</sup> *and* <sup>B</sup> <sup>≤</sup> <sup>A</sup> *implies* <sup>C</sup>, proc(a, [[<sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup>]]a,b) −→<sup>∗</sup> abort(ρ)*.*

Finally, we prove that monitors translated from casts are partial identify processes.

Both System T and S

**Fig. 6.** Typing process expressions

#### **Theorem 3 (Casts are transparent).** <sup>b</sup> : <sup>B</sup> proc(b, a <sup>←</sup> <sup>b</sup>) <sup>∼</sup> proc(a, [[<sup>A</sup> ⇐ <sup>B</sup><sup>ρ</sup>]]a,b) :: (<sup>a</sup> : <sup>A</sup>)*.*

*Proof.* We just need to show that the translated process passes the partial identity checks. We can show this by induction over the translation rules and by applying the rules in Sect. 4. We note that rules in Sect. 4 only consider identical types; however, our casts only cast between two compatible types. Therefore, we can lift A and B to their super types (i.e., insert abort cases for mismatched labels), and then apply the checking rules. This does not change the semantics of the monitors.

### **6 Related Work**

There is a rich body of work on higher-order contracts and the correctness of blame assignments in the context of the lambda calculus [2,7,8,10,16,24,25]. The contracts in these papers are mostly based on refinement or dependent types. Our contracts are more expressive than the above, and can encode refinement-based contracts. While our monitors are similar to reference monitors (such as those described by Schneider [19]), they have a few features that are not inherent to reference monitors such as the fact that our monitors are written in the target language. Our monitors are also able to monitor contracts in a higher-order setting by spawning a separate monitor for the sent/received channel.

Disney et al.'s [9] work, which investigates behavioral contracts that enforce temporal properties for modules, is closely related to our work. Our contracts (i.e., session types) also enforce temporal properties; the session types specify the order in which messages are sent and received by the processes. Our contracts can also make use of internal state, as those of Disney et al, but our system is concurrent, while their system does not consider concurrency.

Recently, gradual typing for two-party session-type systems has been developed [14,20]. Even though this formalism is different from our contracts, the way untyped processes are gradually typed at run time resembles how we monitor type casts. Because of dynamic session types, their system has to keep track of the linear use of channels, which is not needed for our monitors.

Most recently, Melgratti and Padovani have developed chaperone contracts for higher-order session types [17]. Their work is based on a classic interpretation of session types, instead of an intuitionistic one like ours, which means that they do not handle spawning or forwarding processes. While their contracts also inspect messages passed between processes, unlike ours, they cannot model contracts which rely on the monitor making use of internal state (e.g., the parenthesis matching). They proved a blame theorem relying on the notion of locally correct modules, which is a semantic categorization of whether a module satisfies the contract. We did not prove a general blame theorem; instead, we prove a somewhat standard safety theorem for cast-based contracts.

The Whip system [27] addresses a similar problem as our prior work [15], but does not use session types. They use a dependent type system to implement a contract monitoring system that can connect services written in different languages. Their system is also higher order, and allows processes that are monitored by Whip to interact with unmonitored processes. While Whip can express dependent contacts, Whip cannot handle stateful contracts. Another distinguishing feature of our monitors is that they are partial identity processes encoded in the same language as the processes to be monitored.

### **7 Conclusion**

We have presented a novel approach for contract-checking for concurrent processes. Our model uses partial identity monitors which are written in the same language as the original processes and execute transparently. We define what it means to be a partial identity monitor and prove our characterization correct. We provide multiple examples of contracts we can monitor including ones that make use of the monitor's internal state, ones that make use of the idea of probabilistic result checking, and ones that cannot be expressed as dependent or refinement types. We translate contracts in the refinement fragment into monitors, and prove a safety theorem for that fragment.

**Acknowledgment.** This research was supported in part by NSF grant CNS1423168 and a Carnegie Mellon University Presidential Fellowship.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Typing Discipline for Statically Verified Crash Failure Handling in Distributed Systems**

Malte Viering1(B), Tzu-Chun Chen<sup>1</sup>, Patrick Eugster1,2,3, Raymond Hu<sup>4</sup>, and Lukasz Ziarek<sup>5</sup>

<sup>1</sup> Department of Computer Science, TU Darmstadt, Darmstadt, Germany viering@dsp.tu-darmstadt.de

<sup>2</sup> Faculty of Informatics, Universit`a della Svizzera italiana, Lugano, Switzerland

<sup>3</sup> Department of Computer Science, Purdue University, West Lafayette, USA

<sup>4</sup> Department of Computing, Imperial College London, London, UK

<sup>5</sup> Department of Computer Science and Engineering, SUNY Buffalo, Buffalo, USA

**Abstract.** A key requirement for many distributed systems is to be resilient toward partial failures, allowing a system to progress despite the failure of some components. This makes programming of such systems daunting, particularly in regards to avoiding inconsistencies due to failures and asynchrony. This work introduces a formal model for crash failure handling in asynchronous distributed systems featuring a lightweight coordinator, modeled in the image of widely used systems such as ZooKeeper and Chubby. We develop a typing discipline based on multiparty session types for this model that supports the specification and static verification of multiparty protocols with explicit failure handling. We show that our type system ensures subject reduction and progress in the presence of failures. In other words, in a well-typed system even if some participants crash during execution, the system is guaranteed to progress in a consistent manner with the remaining participants.

### **1 Introduction**

*Distributed Programs, Partial Failures, and Coordination.* Developing programs that execute across a set of physically remote, networked processes is challenging. The correct operation of a *distributed program* requires correctly designed protocols by which concurrent processes interact asynchronously, and correctly implemented processes according to their roles in the protocols. This becomes particularly challenging when distributed programs have to be resilient to *partial failures*, where some processes crashes while others remain operational. Partial failures affect both *safety* and *liveness* of applications. Asynchrony is the key

c The Author(s) 2018

Financially supported by ERC grant FP7-617805 "LiVeSoft - Lightweight Verification of Software", NSF grants CNS-1405614 and IIS-1617586, and EPSRC EP/K034413/1 and EP/K011715/1.

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 799–826, 2018. https://doi.org/10.1007/978-3-319-89884-1\_28

issue, resulting in the inability to distinguish slow processes from failed ones. In general, this makes it impossible for processes to reach agreement, even when only a single process can crash [19].

In practice, such impasses are overcome by making appropriate assumptions for the considered infrastructure and applications. One common approach is to assume the presence of a highly available *coordination service* [26] – realized using a set of replicated processes large enough to survive common rates of process failures (e.g., 1 out of 3, 2 out of 5) – and delegating critical decisions to this service. While this *coordinator model* has been in widespread use for many years (cf. *consensus service* [22]), the advent of cloud computing has recently brought it further into the mainstream, via instances like Chubby [4] and ZooKeeper [26]. Such systems are used not only by end applications but also by a variety of frameworks and middleware systems across the layers of the protocol stack [11, 20,31,40].

*Typing Disciplines for Distributed Programs.* Typing disciplines for distributed programs is a promising and active research area towards addressing the challenges in the correct development of distributed programs. See H¨uttel et al. [27] for a broad survey. *Session types* are one of the established typing disciplines for message passing systems. Originally developed in the π-calculus [23], these have been later successfully applied to a range of practical languages, e.g., Java [25,41], Scala [39], Haskell [34,38], and OCaml [28,37]. *Multiparty* session types (MPSTs) [15,24] generalize session types beyond two participants. In a nutshell, a standard MPST framework takes (1) a specification of the whole multiparty message protocol as a *global type*; from which (2) *local types*, describing the protocol from the perspective of each participant, are derived; these are in turn used to (3) statically *type check* the I/O actions of endpoint programs implementing the session participants. A well-typed system of session endpoint programs enjoys important safety and liveness properties, such as *no reception errors* (only expected messages are received) and *session progress*. A basic intuition behind MPSTs is that the design (i.e., restrictions) of the type language constitutes a class of distributed protocols for which these properties can be statically guaranteed by the type system.

Unfortunately, *no* MPST work supports protocols for asynchronous distributed programs dealing with *partial failures due to process crashes*, so the aforementioned properties no longer hold in such an event. Several MPST works have treated communication patterns based on *exception messages* (or *interrupts*) [6,7,16]. In these works, such messages may convey exceptional states in an *application* sense; from a protocol compliance perspective, however, these messages are the same as any other message communicated during a *normal* execution of the session. This is in contrast to *process* failures, which may invalidate already in-transit (*orphan*) messages, and where the task of agreeing on the concerted handling of a crash failure is itself prone to such failures.

Outside of session types and other type-based approaches, there have been a number of advances on verifying fault tolerant distributed protocols and applications (e.g., based on model checking [29], proof assistants [44]); however, little work exists on providing direct compile-time support for *programming* such applications in the spirit of MPSTs.

*Contributions and Challenges.* This paper puts forward a new typing discipline for safe specification and implementation of distributed programs prone to process crash failures based on MPSTs. The following summarizes the key challenges and contributions.


To fit our model to practice, we introduce programming constructs similar to well-known and intuitive exception handling mechanisms, for handling concurrent and asynchronous process crash failures in sessions. These constructs serve to integrate user-level session control flow in endpoint processes and the underlying communications with the coordination service, used by the target applications of our work to outsource critical failure management decisions (see Fig. 1). It is important to note that the coordinator does *not* magically solve all problems. Key design challenges are to ensure that communication with it is fully asynchronous as in real-life, and that it is involved only in a "minimal" fashion. Thus we treat the coordinator as a first-class, asynchronous network artifact, as opposed to a convenient but impractical global "oracle" (cf. [6]), and our operational semantics of multiparty sessions remains primarily *choreographic* in the original spirit of distributed MPSTs, unlike works that resort to a centralized *orchestrator* to conduct all actions [5,8]. As depicted in Fig. 1, application-specific communication does not involve the coordinator. Our model lends itself to common practical scenarios where processes monitor each other in a peer-based fashion to detect failures, and rely on a coordinator only to establish agreement on which processes have failed, and when.

A long version of this paper is available online [43]. The long version contains: full formal definitions, full proofs, and a prototype implementation in Scala.

*Example.* As a motivating example, Fig. 2 gives a global formal specification for a big data streaming task between a distributed file system (DFS) *dfs*, and two

**Fig. 1.** Coordinator model for asynchronous distributed systems. The coordinator is implemented by replicated processes (internals omitted).

$$\begin{array}{c} \{dfs\}G = \mathfrak{t}(\mu t. \\\ dfs \to w\_{\mathfrak{l}} \; l\_{d\_1}(S).dfs \to w\_{\mathfrak{l}} \; l\_{d\_2}(S). \\\ w\_{\mathfrak{l}} \to dfs \; l\_{r\_1}(S').w\_{\mathfrak{l}} \to dfs \; l\_{r\_2}(S').t. \\\ \mathsf{h}(\\\ w\_{\mathfrak{l}} \; \{\;\cdot\}; \mu t'.dfs \to w\_{\mathfrak{l}} \; l'\_{d\_1}(S). \\\ w\_{\mathfrak{l}} \to dfs \; l'\_{r\_1}(S').t', \\\ \{w\_{\mathfrak{l}}\}; \ldots, \{w\_{\mathfrak{l}}\;, w\_{\mathfrak{l}}\}; \mathsf{end} \} \end{array}$$

**Fig. 2.** Global type for a big data streaming task with failure handling capabilities.

workers *w1*,*<sup>2</sup>* . The DFS streams data to two workers, which process the data and write the result back. Most DFSs have built-in fault tolerance mechanisms [20], so we consider *dfs* to be *robust*, denoted by the annotation [*dfs*]; the workers, ] however, may individually fail. In the *try-handle* construct t(...)h(...), the *tryblock* t(...) gives the *normal* (i.e., failure-free) flow of the protocol, and h(...) contains the explicit *handlers* for potential crashes. In the try-block, the workers receive data from the DFS (*dfs*→*wi*), perform local computations, and send back the result (*wi*→*dfs*). If a worker crashes ({*wi*}: ...), the other worker will also take over the computation of the crashed worker, allowing the system to still produce a valid result. If both workers crash (by any interleaving of their concurrent crash events), the global type specifies that the DFS should safely terminate its role in the session.

We shall refer to this basic example, that focuses on the new failure handling constructs, in explanations in later sections. We also give many further examples throughout the following sections to illustrate the potential session errors due to failures exposed by our model, and how our framework resolves them to recover MPST safety and progress.

*Roadmap.* Section 2 describes the adopted system and failure model. Section 3 introduces global types for guiding failure handling. Section 4 introduces our process calculus with failure handling capabilities and a coordinator. Section 5 introduces local types, derived from global types by projection. Section 6 describes typing rules, and defines *coherence* of session environments with respect to endpoint crashes. Section 7 states properties of our model. Section 8 discusses related work. Sect. 9 draws conclusions.

### **2 System and Failure Model**

In distributed systems care is required to avoid partial failures affecting liveness (e.g., waiting on messages from crashed processes) or safety (e.g., when processes manage to communicate with some peers but not others before crashing) properties of applications. Based on the nature of the infrastructure and application, appropriate *system and failure models* are chosen along with judiciously made assumptions to overcome such impasses in practice.

We pinpoint the key characteristics of our model, according to our practical motivations and standard distributed systems literature, that shape the design choices we make later for the process calculus and types. As it is common we augment our system with a *failure detector* (FD) to allow for distinguishing slow and failed processes. The advantage of the FD (1) in terms of reasoning is that it concentrates all assumptions to solve given problems and (2) implementationwise it yields a single main module where time-outs are set and used.

Concretely we make the following assumptions on failures and the system:


(1)–(3) are standard in literature on fault-tolerant distributed systems [19].

Note that processes can still recover but will not do so *within* sessions (or will not be re-considered for those). Other failure models, e.g., network partitions [21] or Byzantine failures [32], are subject of future work. The former are not tolerated by ZooKeeper et al., and the latter have often been argued to be a too generic failure model (e.g., [3]).

The assumption on the coordinator (4) implicitly means that the number of concomitant failures among the coordinator replicas is assumed to remain within a minority, and that failed replicas are replaced in time (to tolerate further failures). Without loss of validity, the coordinator internals can be treated as a blackbox. The final assumption (5) on failure detection is backed in practice by the concept of *program-controlled* crash [10], which consists in communicating decisions to disregard supposedly failed processes also to those processes, prompting them to reset themselves upon false suspicion. In practice systems can be configured to minimize the probability of such events, and by a "twolevel" membership consisting in evicting processes from *individual* sessions (cf. recovery above) more quickly than from a system as a whole; several authors have also proposed network support to entirely avoid false suspicions (e.g., [33]).

These assumptions do not make handling of failures trivial, let alone mask them. For instance, the network can arbitrarily delay messages and thus reorder them with respect to their real sending times, and (so) different processes can detect failures at different points in time and in different orders.

**Fig. 3.** Syntax of global types with explicit handling of partial failures.

### **3 Global Types for Explicit Handling of Partial Failures**

Based on the foundations of MPSTs, we develop *global types* to formalize specifications of distributed protocols with explicit handling of *partial failures due to role crashes*, simply referred to as *failures*. We present global types before introducing the process calculus to provide a high-level intuition of how failure handling works in our model.

The syntax of *global types* is depicted in Fig. 3. We use the following base notations: *p*, *q*, ... for *role* (i.e., participant) names; l1, l2, ... for message *labels*; and *t*,*t* , ... for type variables. *Base types* S may range over, bool, int, etc.

Global types are denoted by G. We first summarize the constructs from standard MPST [15,24]. A *branch* type <sup>p</sup> <sup>→</sup> <sup>q</sup>{li(Si).G<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> means that *<sup>p</sup>* can send to *q one* of the messages of type S<sup>k</sup> with label lk, where k is a member of the non-empty index set I. The protocol then proceeds according to the continuation <sup>G</sup>k. When <sup>I</sup> is a singleton, we may simply write <sup>p</sup>→q l(S).G. We use *<sup>t</sup>* for type variables and take an equi-recursive view, i.e., μ*t*.G and its unfolding [μ*t*.G/*t*] are equivalent. We assume type variable occurrences are bound and guarded (e.g., μ*t*.*t* is not permitted). end is for termination.

We now introduce our extensions for partial failure handling. A *try-handle* t(G1)h(*H* )<sup>κ</sup>.G<sup>2</sup> describes a "failure-atomic" protocol unit: all *live* (i.e., noncrashed) roles will eventually reach a consistent protocol state, despite any concurrent and asynchronous role crashes. The try-block G<sup>1</sup> defines the *default* protocol flow, and *H* is a *handling environment*. Each element of *H* maps a *handler signature* <sup>F</sup>, that specifies a set of *failed* roles {*p*<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> , to a *handler body* specified by a G. The handler body G specifies how the live roles should proceed given the failure of roles F. The protocol then proceeds (for live roles) according to the continuation G<sup>2</sup> after the default block G<sup>1</sup> or failure handling defined in *H* has been completed as appropriate.

To simplify later technical developments, we annotate each try-handle term in a given <sup>G</sup> by a unique <sup>κ</sup> <sup>∈</sup> <sup>N</sup> that lexically identifies the term within <sup>G</sup>. These annotations may be assigned mechanically. As a short hand, we refer to the tryblock and handling environment of a particular try-handle by its annotation; e.g., we use κ to stand for t(G1)h(*H* )<sup>κ</sup>. In the running examples (e.g., Fig. 2), if there exists only one try-handle, we omit κ for simplicity.

*Top-Level Global Types and Robust Roles.* We use the term *top-level* global type to mean the source protocol specified by a user, following a typical top-down interpretation of MPST frameworks [15,24]. We allow top-level global types to be optionally annotated [˜*p*]G, where [˜*p*] specifies a set of ] *robust* roles—i.e., roles that can be assumed to never fail. In practice, a participant may be robust if it is replicated or is made inherently fault tolerant by other means (e.g., the participant that represents the distributed file system in Fig. 2).

*Well-Formedness.* The first stage of validation in standard MPSTs is to check that the top-level global type satisfies the supporting criteria used to ensure the desired properties of the type system. We first list basic syntactic conditions which we assume on any given G: (i) each F is non-empty; (ii) a role in a F cannot occur in the corresponding handler body (a failed role cannot be involved in the handling of its own failure); and (iii) every occurrence of a non-robust role *p* must be contained within a, possibly outer, try-handle that has a handler signature {*p*} (the protocol must be able to handle its potential failure). Lastly, to simplify the presentation without loss of generality, we impose that separate branch types *not* defined in the same default block or handler body must have disjoint label sets. This can be implicitly achieved by combining label names with try-handle annotations.

Assuming the above, we define *well-formedness* for our extended global types. We write <sup>G</sup> <sup>∈</sup> <sup>G</sup> to mean that <sup>G</sup> syntactically occurs in <sup>G</sup> (<sup>∈</sup> is reflexive); similarly for the variations <sup>κ</sup> <sup>∈</sup> <sup>G</sup> and <sup>κ</sup> <sup>∈</sup> <sup>κ</sup> . Recall κ is shorthand for t(G1)h(*H* )<sup>κ</sup>. We use a lookup function *outerG*(κ) for the set of all try-handles in G that enclose a given <sup>κ</sup>, including <sup>κ</sup> itself, defined by *outerG*(κ) = {κ <sup>|</sup> <sup>κ</sup> <sup>∈</sup> <sup>κ</sup> <sup>∧</sup> <sup>κ</sup> <sup>∈</sup> <sup>G</sup>}.

**Definition 1 (Well-formedness).** Let κ stand for t(G1)h(*H* )<sup>κ</sup>, and κ for t(G <sup>1</sup>)h(*H* )κ- . A global type G is *well-formed* if both of the following conditions hold. For all <sup>κ</sup> <sup>∈</sup> <sup>G</sup>:

1. <sup>∀</sup>F<sup>1</sup> <sup>∈</sup> *dom*(*<sup>H</sup>* ).∀F<sup>2</sup> <sup>∈</sup> *dom*(*<sup>H</sup>* ).∃κ <sup>∈</sup> *outerG*(κ) s.t. <sup>F</sup><sup>1</sup> <sup>∪</sup> <sup>F</sup><sup>2</sup> <sup>∈</sup> *dom*(*<sup>H</sup>* ) 2. -<sup>F</sup> <sup>∈</sup> *dom*(*<sup>H</sup>* ).∃κ <sup>∈</sup> *outerG*(κ).∃F <sup>∈</sup> *dom*(*<sup>H</sup>* ) s.t. <sup>κ</sup> <sup>=</sup> <sup>κ</sup> <sup>∧</sup> <sup>F</sup> <sup>⊆</sup> <sup>F</sup>

The first condition asserts that for any two separate handler signatures of a handling environment of κ, there always exists a handler whose handler signature matches the union of their respective failure sets – this handler is either inside the handling environment of κ itself, or in the handling environment of an outer try-handle. This ensures that if roles are active in different handlers of the same try-handle then there is a handler whose signature corresponds to the union over the signatures of those different handlers. Example 2 together with Example 3 in Sect. 4 illustrate a case where this condition is needed. The second condition asserts that if the handling environment of a try-handle contains a handler for F, then there is no outer try-handle with a handler for F such that <sup>F</sup> <sup>⊆</sup> <sup>F</sup>. The reason for this condition is that in the case of *nested* try-handles, our communication model allows separate try-handles to start failure handling independently (the operational semantics will be detailed in the next section; see (**TryHdl**) in Fig. 6). The aim is to have the relevant roles eventually converge on performing the handling of the outermost try-handle, possibly by interrupting the handling of an inner try-handle. Consider the following example:

*Example 1.* G = t(t(G )h({*p<sup>1</sup>* , *<sup>p</sup><sup>2</sup>* } : <sup>G</sup>1)<sup>2</sup>)h({*p<sup>1</sup>* } : <sup>G</sup> <sup>1</sup>)<sup>1</sup> violates condition 2 because, when *p<sup>1</sup>* and *p<sup>2</sup>* both failed, the handler signature {*p<sup>1</sup>* } will still be triggered (i.e., the outer try-handle will eventually take over). It is not sensible to run G <sup>1</sup> instead of G<sup>1</sup> (which is for the crashes of *p<sup>1</sup>* and *p<sup>2</sup>* ).

**Fig. 4.** Challenges under pure asynchronous interactions with a coordinator. Between time (1) and time (2), the task φ = (κ, ∅) is interrupted by the crash of Pa. Between time (3) and time (4), due to asynchrony and multiple crashes, P<sup>c</sup> starts handling the crash of {Pa, P<sup>d</sup>} without handling the crash of {P<sup>a</sup>}. Finally after (4) P<sup>b</sup> and P<sup>c</sup> finish their common task.

### **4 A Process Calculus for Coordinator-Based Failure Handling**

Figure 4 depicts a scenario that can occur in practical asynchronous systems with coordinator-based failure handling through frameworks such as ZooKeeper (Sect. 2). Using this scenario, we first illustrate challenges, formally define our model, and then develop a safe type system.

The scenario corresponds to a global type of the form <sup>t</sup>(G)h({P<sup>a</sup>} : <sup>G</sup>a, {Pa, P<sup>d</sup>} : <sup>G</sup>ad, ...)<sup>κ</sup>, with processes <sup>P</sup>a..d and a coordinator *<sup>Ψ</sup>*. We define a *task* to mean a unit of interactions, which includes failure handling behaviors. Initially all processes are collaborating on a task <sup>φ</sup>, which we label (κ, <sup>∅</sup>) (identifying the task context, and the set of failed processes). The shaded boxes signify which task each process is working on. Dotted arrows represent notifications between processes and *Ψ* related to task completion, and solid arrows for failure notifications from *Ψ* to processes. During the scenario, P<sup>a</sup> first fails, then <sup>P</sup><sup>d</sup> fails: the execution proceeds through failure handling for {P<sup>a</sup>} and {Pa, P<sup>d</sup>}.

(I) When P<sup>b</sup> reaches the end of its part in φ, the application has P<sup>b</sup> notify *Ψ*. P<sup>b</sup> then remains in the context of φ (the continuation of the box after notifying) in consideration of other non-robust participants still working on φ—P<sup>b</sup> may yet need to handle their potential failure(s).


**Fig. 5.** Grammar for processes, applications, systems, and evaluation contexts.


*Processes.* Figure 5 defines the grammar of processes and (distributed) applications. Expressions e, ei, .. can be values v, vi, ..., variables x, xi, ..., and standard operations. (Application) processes are denoted by P, Pi, .... An initialization a[p](y).P agrees to play role p via shared name a and takes actions defined in P; actions are executed on a session channel c : η, where c ranges over s[*p*] (session name and role name) and session variables y; η represents action statements.

A try-handle t(η)h(H)<sup>φ</sup> attempts to execute the local action η, and can handle failures occurring therein as defined in the handling environment H, analogously to global types. H thus also maps a handler signature F to a handler body η defining how to handle F. Annotation φ = (κ, F) is composed of two elements: an identity κ of a *global* try-handle, and an indication of the *current* handler signature which can be empty. <sup>F</sup> <sup>=</sup> <sup>∅</sup> means that the default try-block is executing, whereas <sup>F</sup> <sup>=</sup> <sup>∅</sup> means that the handler body for <sup>F</sup> is executing. Term 0 only occurs in a try-handle during runtime. It denotes a *yielding* for a *notification* from a *coordinator* (introduced shortly).

Other statements are similar to those defined in [15,24]. Term 0 represents an *idle* action. For convention, we omit 0 at the end of a statement. Action p! l(e).η represents a sending action that sends *p* a label l with content e, then it continues as <sup>η</sup>. Branching <sup>p</sup>?{li(xi).ηi}i∈<sup>I</sup> represents a receiving action from *<sup>p</sup>* with several possible branches. When label l<sup>k</sup> is selected, the transmitted value <sup>v</sup> is saved in <sup>x</sup>k, and <sup>η</sup>k{v/xk} continues. For convenience, when there is only one branch, the curly brackets are omitted, e.g., c : p?l(x).P means there is only one branch <sup>l</sup>(x). <sup>X</sup><sup>e</sup> is for a statement variable with one parameter <sup>e</sup>, and def D in η is for recursion, where declaration D defines the recursive body that can be called in η. The conditional statement is standard.

The structure of processes ensures that failure handling is not interleaved between different sessions. However, we note that in standard MPSTs [15,24], session interleaving must anyway be prohibited for the basic progress property. Since our aim will be to show progress, we disallow session interleaving within process bodies. Our model does allow parallel sessions at the top-level, whose actions may be concurrently interleaved during execution.

*(Distributed) Systems.* A (distributed) *system* in our programming framework is a composition of an application, which contains more than one process, and a coordinator (cf. Fig. 1). A system can be running within a private session s, represented by (νs)S, or S|S for systems running in different sessions independently and in parallel (i.e., no session interleaving). The job of the coordinator is to ensure that even in the presence of failures there is consensus on whether all participants in a given try-handle completed their local actions, or whether failures need to be handled, and which ones. We use *Ψ* = G : (F, d) to denote a (robust) coordinator for the global type G, which stores in (F, d) the failures F that occurred in the application, and in d done notifications sent to the coordinator. The coordinator is denoted by ψ when viewed as a role.

A (distributed) *application*<sup>1</sup> is a process <sup>P</sup>, a parallel composition <sup>N</sup> <sup>|</sup>N , or a global queue carrying messages s : h. A global queue s : h carries a sequence of messages m, sent by participants in session s. A message is either a regular message *<sup>p</sup>*, *<sup>q</sup>*, l(v) with label <sup>l</sup> and content <sup>v</sup> sent from *<sup>p</sup>* to *<sup>q</sup>* or a *notification*. A notification may contain the role of a coordinator. There are *done* and *failure* notifications with two kinds of done notifications dn used for coordination: *<sup>p</sup>*, ψ<sup>φ</sup> notifies <sup>ψ</sup> that *<sup>p</sup>* has finished its local actions of the try-handle <sup>φ</sup>; ψ, *<sup>p</sup>* <sup>φ</sup> is sent from <sup>ψ</sup> to notify *<sup>p</sup>* that <sup>ψ</sup> has received all done notifications for the try-handle φ so that *p* shall end its current try-handle and move to its next task. For example, in Fig. 4 at time (4) the coordinator will inform P<sup>b</sup> and <sup>P</sup><sup>c</sup> via ψ,*P<sup>b</sup>*(κ,{P*a*,P*d*}) .ψ,*P<sup>c</sup>*(κ,{P*a*,P*d*}) that they can finish the try-handle (κ, {Pa, P<sup>d</sup>}). Note that the appearance of ψ, *<sup>p</sup>* <sup>φ</sup> implies that the coordinator has been informed that all participants in φ have completed their local actions. We define two kinds of *failure* notifications: [ψ, crash <sup>F</sup>] notifies <sup>ψ</sup> that <sup>F</sup> occurred, e.g., {*q*} means *<sup>q</sup>* has failed; [*p*, crash <sup>F</sup>] is sent from <sup>ψ</sup> to notify *<sup>p</sup>* about the failure <sup>F</sup> for possible handling. We write [*p*-, crash <sup>F</sup>], where *p*- <sup>=</sup> *<sup>p</sup><sup>1</sup>* , ..., *<sup>p</sup><sup>n</sup>* short for [*p<sup>1</sup>* , crash <sup>F</sup>] · ... · [*pn*, crash <sup>F</sup>]; similarly for ψ, *<sup>p</sup>*φ.

<sup>1</sup> Other works use the term *network* which is the reason why we use N instead of, e.g., A. We call it application to avoid confusion with the physical network which interconnects all processes as well as the coordinator.

$$\begin{array}{ccc} \frac{N\_1 \to N\_2}{\psi \bullet N\_1 \to \psi \bullet N\_2} & \frac{\mathcal{S} \to \mathcal{S}'}{(\nu \mathfrak{s}) \mathcal{S} \to (\nu \mathfrak{s}) \mathcal{S}'} \end{array} \tag{\mathsf{Sys}, \mathsf{New}}$$

$$\begin{array}{rcl} N & | & \mathbf{s} : h \to N \\ \hline \end{array} \\ \begin{array}{rcl} \mathbf{s}[p] : \eta & | & \mathbf{s} : remove(h, p) \cdot \{\psi, \mathbf{c} \mathbf{s} \mathbf{s} \} \\ \mathbf{s}[p] : \eta & \text{non-robust } \{\mathbf{C} \mathbf{s} \mathbf{s} \} \\ \end{array}$$

**Fig. 6.** Operational semantics of distributed applications, for local actions.

Following the tradition of other MPST works the global queue provides an abstraction for multiple FIFO queues, each queue being between two endpoints (cf. TCP) with no global ordering. Therefore <sup>m</sup><sup>i</sup> ·m<sup>j</sup> can be permuted to <sup>m</sup><sup>j</sup> ·m<sup>i</sup> in the global queue if the sender or the receiver differ. For example the following messages are permutable: *<sup>p</sup>*, *<sup>q</sup>*, l(v)·*p*, *<sup>q</sup>* , l(v) if <sup>q</sup> <sup>=</sup> <sup>q</sup> and *<sup>p</sup>*, *<sup>q</sup>*, l(v)·ψ, *<sup>p</sup>* <sup>φ</sup> and *<sup>p</sup>*, *<sup>q</sup>*, l(v)·[*q*, crash <sup>F</sup>]. But ψ, *<sup>p</sup>* <sup>φ</sup> · [*p*, crash <sup>F</sup>] is not permutable, both have the same sender and receiver (<sup>ψ</sup> is the sender of [*p*, crash <sup>F</sup>]).

*Basic Dynamic Semantics for Applications.* Figure 6 shows the operational semantics of applications. We use evaluation contexts as defined in Fig. 5. Context E is either a hole [ ], a default context t(E)h(H)<sup>φ</sup>.η, or a recursion context def D in E. We write E[η] to denote the action statement obtained by filling the hole in <sup>E</sup>[·] with <sup>η</sup>.

Rule (**Link**) says that (local) processes who agree on shared name a, obeying to some protocol (global type), playing certain roles *p<sup>i</sup>* represented by a[*pi*](yi).P, together will start a private session s; this will result in replacing every variable <sup>y</sup><sup>i</sup> in <sup>P</sup><sup>i</sup> and, at the same time, creating a new global queue <sup>s</sup> : <sup>∅</sup>, and appointing a coordinator <sup>G</sup> : (∅, <sup>∅</sup>), which is novel in our work.

Rule (**Snd**) in Fig. 6 reduces a sending action q! l(e) by emitting a message *<sup>p</sup>*, *<sup>q</sup>*, l(v) to the global queue <sup>s</sup> : <sup>h</sup>. Rule (**Rcv**) reduces a receiving action if the message arriving at its end is sent from the expected sender with an expected label. Rule (**Rec**) is for recursion. When the recursive body, defined inside <sup>η</sup>, is called by <sup>X</sup><sup>e</sup> where <sup>e</sup> is evaluated to <sup>v</sup>, it reduces to the statement <sup>η</sup>{v/x} which will again implement the recursive body. Rule (**Str**) says that processes which are structurally congruent have the same reduction. Processes, applications, and systems are considered modulo structural congruence, denoted by <sup>≡</sup>, along with <sup>α</sup>-renaming. Rule (**Par**) and (**Str**) together state that a parallel composition has a reduction if its sub-application can reduce. Rule (**Sys**) states that a system has a reduction if its application has a reduction, and (**New**) says a reduction can proceed under a session. Rule (**Crash**) states that a process on channel s[*p*] can fail at any point in time. (**Crash**) also adds a notification [ψ, crash <sup>F</sup>] which is sent to <sup>ψ</sup> (the coordinator). This is an abstraction for the failure detector described in Sect. <sup>2</sup> (5), the notification [ψ, crash <sup>F</sup>] is the first such notification issued by a participant based on its local failure detector. Adding the notification into the global queue instead of making the coordinator immediately aware of it models that failures are only detected eventually. Note that a failure is not annotated with a level because failures transcend all levels, and asynchrony makes it impossible to identify "where" exactly they occurred. As a failure is permanent it can affect multiple try-handles. The (**Crash**) rule does not apply to participants which are robust, i.e., that conceptually cannot fail (e.g., *dfs* in Fig. 2). Rule (**Crash**) removes channel s[*p*] (the failed process) from application N, and removes messages and notifications delivered from, or heading to, the failed *p* by function *remove*(h, *p*). Function *remove*(h, *p*) returns a new queue after removing all regular messages and notifications that contain *p*, e.g., let <sup>h</sup> <sup>=</sup> *<sup>p</sup><sup>2</sup>* , *<sup>p</sup><sup>1</sup>* , l(v)·*p<sup>3</sup>* , *<sup>p</sup><sup>2</sup>* , l (v )·*p<sup>3</sup>* , *<sup>p</sup><sup>4</sup>* , l (v )·*p<sup>2</sup>* , ψ<sup>φ</sup> · [*p<sup>2</sup>* , crash {*p<sup>3</sup>* }] · ψ, *<sup>p</sup><sup>2</sup>* <sup>φ</sup> then *remove*(h, *<sup>p</sup><sup>2</sup>* ) = *<sup>p</sup><sup>3</sup>* , *<sup>p</sup><sup>4</sup>* , l (v ). Messages are removed to model that in a real system send/receive does *not* constitute an atomic action.

*Handling at Processes.* Failure handling, defined in Fig. 7, is based on the observations that (i) a process that fails stays down, and (ii) multiple processes can fail. As a consequence a failure can trigger multiple failure handlers either because these handlers are in different (subsequent) try-handles or because of additional failures. Therefore a process needs to retain the information of *who* failed. For simplicity we do not model state at processes, but instead processes read but do not remove failure notifications from the global queue. We define *Fset*(h, *p*) to return the union of failures for which there are notifications heading to *<sup>p</sup>*, i.e., [*p*, crash <sup>F</sup>], issued by the coordinator in queue <sup>h</sup> *up to the first done notification heading to p*: ⎧⎪⎨

#### **Definition 2 (Union of Existing Failures** *Fset*(h, *p*)**)** ⎪⎩

$$Fset(\emptyset, p) = \emptyset \quad Fset(h, p) = \begin{cases} F \cup Fset(h', p) & \text{if } h = \{p, \text{crsash } F\} \cdot h' \\ \emptyset & \text{if } h = \langle \psi, p \rangle^{\phi} \cdot h' \\ Fset(h', p) & \text{otherwise} \end{cases}$$

In short, if the global queue is ∅, then naturally there are no failure notifications. If the global queue contains a failure notification sent from the coordinator, say [*p*, crash <sup>F</sup>], we collect the failure. If the global queue contains done notification ψ, *<sup>p</sup>* <sup>φ</sup> sent from the coordinator then *all* participants in <sup>φ</sup> have finished their local actions, which implies that the try-handle φ can be completed. Our failure handling semantics, (**TryHdl**), allows a try-handle φ = (κ, F) to handle different failures or sets of failures by allowing a try-handle to switch between different handlers. F thus denotes the current set of handled failures. For simplicity we refer to this as the *current(ly handled) failure set*. This is a slight abuse of terminology, done for brevity, as obviously failures are only detected with a

$$\begin{array}{c} F' = \cup \{ A \mid A \in dom(\mathsf{H}) \land F \subset A \subseteq Fset(h, p) \} & F' \colon \eta' \in \mathsf{H} \\ \hline \mathbf{s}[p] : E[\mathsf{t}(\eta) \mathsf{h}(\mathsf{H})^{(\kappa, F)}.\eta''] \mid \mathsf{s} : h \to \mathsf{s}[p] : E[\mathsf{t}(\eta') \mathsf{h}(\mathsf{H})^{(\kappa, F')}.\eta''] \mid \mathsf{s} : h \ \\ \mathbf{s}[p] : E[\mathsf{t}(\mathsf{0}) \mathsf{h}(\mathsf{H})^{\phi}.\eta] \mid \mathsf{s} : h \to \mathsf{s}[p] : E[\mathsf{t}(\underline{\mathsf{0}}) \mathsf{h}(\mathsf{H})^{\phi}.\eta] \mid \mathsf{s} : h \ \langle p, \psi \rangle^{\phi} \quad \text{(\mathbf{S} \mathsf{nd} \mathbf{D} \mathsf{one})} \\ \cline{3-4} & \begin{array}{c} \langle \psi, p \rangle^{\phi} \in h \\ \mathbf{s}[p] : E[\mathsf{t}(\underline{\mathsf{0}}) \mathsf{h}(\mathsf{H})^{\phi}.\eta] \mid \mathsf{s} : h \to \mathsf{s}[p] : E[\eta] \mid \langle \psi, p \rangle^{\phi} \end{array} \quad \text{(\mathbf{R} \mathsf{ev} \mathsf{Done})} \end{array}$$

$$\mathbf{s}[p]: E[\eta] \quad | \quad \mathbf{s}: \langle q, p, l(v) \rangle \cdot h \to \mathbf{s}[p]: E[\eta] \quad | \quad \mathbf{s}: h \quad l \notin \mathit{labels}(E[\eta]) \quad \{\mathsf{Clm}\}$$

$$\begin{array}{c} \{\psi, p\}^{\vee} \in h \quad \phi \notin E[\eta] \\ \hline \mathsf{s}[p] : E[\eta] \quad | \ \mathsf{s} : h \to \mathsf{s}[p] : E[\eta] \quad | \ \mathsf{s} : h \rangle \left\langle \psi, p \right\rangle^{\phi} \end{array} \tag{\mathsf{ClnDone}}$$

**Fig. 7.** Operational semantics of distributed applications, for endpoint handling.

certain lag. The handling strategy for a process is to handle the—currently largest set of failed processes that this process has been informed of and is able to handle. This largest set is calculated by ∪{<sup>A</sup> <sup>|</sup> <sup>A</sup> <sup>∈</sup> *dom*(H) <sup>∧</sup> <sup>F</sup> <sup>⊂</sup> <sup>A</sup> <sup>⊆</sup> *Fset*(h, *<sup>p</sup>*)}, that selects all failure sets which are larger than the current one (<sup>A</sup> <sup>∈</sup> *dom*(H) <sup>∧</sup> <sup>F</sup> <sup>⊂</sup> <sup>A</sup>) if they are also triggered by known failures (<sup>A</sup> <sup>⊆</sup> *Fset*(h, *<sup>p</sup>*)). Condition <sup>F</sup> : <sup>η</sup> <sup>∈</sup> <sup>H</sup> in (**TryHdl**) ensures that there exists a handler for <sup>F</sup> . The following example shows how (**TryHdl**) is applied to switch handlers.

*Example 2.* Take <sup>h</sup> such that *Fset*(h, *<sup>p</sup>*) = {*p*1} and <sup>H</sup> <sup>=</sup> {*p*1} : <sup>η</sup>1, {*p*2} : <sup>η</sup>2, {*p*1, *<sup>p</sup>*2} : <sup>η</sup><sup>12</sup> in process <sup>P</sup> <sup>=</sup> <sup>s</sup>[*p*] : <sup>t</sup>(η1)h(H)(κ,{*p*1}), which indicates that <sup>P</sup> is handling failure {*p*1}. Assume now one more failure occurs and results in a new queue h such that *Fset*(h , *<sup>p</sup>*) = {*p*1, *<sup>p</sup>*2}. By (**TryHdl**), the process acting at <sup>s</sup>[*p*] is handling the failure set {*p*1, *<sup>p</sup>*2} such that <sup>P</sup> <sup>=</sup> <sup>s</sup>[*p*] : <sup>t</sup>(η12)h(H)(κ,{*p*1,*p*2}) (also notice the <sup>η</sup><sup>12</sup> inside the try-block). A switch to only handling {*p*2} does not make sense, since, e.g., η<sup>2</sup> can contain *p*1. Figure 2 shows a case where the handling strategy differs according to the number of failures.

In Sect. 3 we formally define well-formedness conditions, which guarantee that if there exist two handlers for two different handler signatures in a try-handle, then a handler exists for their union. The following example demonstrates why such a guarantee is needed.

*Example 3.* Assume a slightly different P compared to the previous examples (no handler for the union of failures): P = s[*p*] : E[t(η)h(H)(κ,∅)] with H = {*p*1} : <sup>η</sup>1, {*p*2} : <sup>η</sup>2. Assume also that *Fset*(h, *<sup>p</sup>*) = {*p*1, *<sup>p</sup>*2}. Here (**TryHdl**) will not apply since there is no failure handling for {*p*1, *<sup>p</sup>*2} in <sup>P</sup>. If we would allow a handler for either {*p*1} or {*p*2} to be triggered we would have no guarantee that other participants involved in this try-handle will all select the same failure set. Even with a deterministic selection, i.e., all participants in that try-handle selecting the same handling activity, there needs to be a handler with handler signature = {*p*1, *<sup>p</sup>*2} since it is possible that *<sup>p</sup>*<sup>1</sup> is involved in <sup>η</sup>2. Therefore the type system will ensure that there is a handler for {*p*1, *<sup>p</sup>*2} either at this level or at an outer level.

(I) explains that a process finishing its default action (Pb) cannot leave its current try-handle (κ, <sup>∅</sup>) immediately because other participants may fail (P<sup>a</sup> failed). Below Eq. 1 also shows this issue from the perspective of semantics:

$$\begin{aligned} \mathsf{s}[p]: \mathfrak{t}(0)\mathfrak{h}(F; q!l(10).q?l'(x))^{(\kappa, \emptyset)}.\eta' &\mid \mathsf{s}[q]: \mathfrak{t}(p?l(x').p!l'(x'+10))\mathfrak{h}(H)^{(\kappa,F)}.\eta''\\ &\mid \mathsf{s}: \{q, \mathsf{crsh} \ F\} \cdot \{p, \mathsf{crsh} \ F\} \cdot h \end{aligned} \tag{1}$$

In Eq. 1 the process acting on s[*p*] ended its try-handle (i.e., the action is 0 in the try-block), and if s[*p*] finishes its try-handle the participant acting on s[*q*] which started handling F would be stuck.

To solve the issue, we use (**SndDone**) and (**RcvDone**) for completing a local try-handle with the help of a coordinator. The rule (**SndDone**) sends out a done notification *<sup>p</sup>*, ψ<sup>φ</sup> if the current action in <sup>φ</sup> is 0 and sets the action to 0, indicating that a done notification from the coordinator is needed for ending the try-handle.

Assume process on channel s[*p*] finished its local actions in the try-block (i.e., as in Eq. 1 above), then by (**SndDone**), we have

$$\begin{aligned} (1) &\rightarrow \text{ s}: \{q, \text{crssh } F\} \cdot \{p, \text{crssh } F\} \cdot \langle p, \psi \rangle^{\langle \kappa, \emptyset \rangle} \cdot h \mid \\ \mathfrak{s}[p]: \mathfrak{t}(\underline{0}) \hbar (F \colon q! l(10).q?l'(x))^{\langle \kappa, \emptyset \rangle} \cdot \eta' \mid \mathfrak{s}[q]: \mathfrak{t}(p?l(x').p?l'(x'+10)) \hbar (H)^{\langle \kappa, F \rangle} \cdot \eta'' \end{aligned}$$

where notification *<sup>p</sup>*, ψ(κ,∅) is added to inform the coordinator. Now the process on channel s[*p*] can still handle failures defined in its handling environment. This is similar to the case described in (II).

Rule (**RcvDone**) is the counterpart of (**SndDone**). Once a process receives a done notification for φ from the coordinator it can finish the try-handle φ and reduces to the continuation η. Consider Eq. 2 below, which is similar to Eq. 1 but we take a case where the try-handle can be reduced with (**RcvDone**). In Eq. 2 (**SndDone**) is applied:

$$\begin{aligned} \mathsf{s}[p]: & (\underline{\mathfrak{s}}) \mathsf{h}(F:q!(10).q?l'(x))^{(\kappa,\emptyset)}.\eta' \mid \\ \mathsf{s}[q]: & (\underline{\mathfrak{s}}) \mathsf{h}(F:p?l(x').p!l'(x'+10))^{(\kappa,\emptyset)}.\eta'' \mid \mathsf{s}:h \end{aligned} \tag{2}$$

With <sup>h</sup> <sup>=</sup> ψ, *<sup>q</sup>*(κ,∅) · ψ, *<sup>p</sup>*(κ,∅) · [*q*, crash <sup>F</sup>]·[*p*, crash <sup>F</sup>] both processes can apply (**RcvDone**) and safely terminate the try-handle (κ, <sup>∅</sup>). Note that *Fset*(h, *<sup>p</sup>*) = *Fset*(h, *<sup>q</sup>*) = <sup>∅</sup> (by Definition 2), i.e., rule (**TryHdl**) can not be applied since a done notification suppresses the failure notification. Thus Eq. 2 will reduce to:

$$\{(2) \rightarrow^\* \mathfrak{s}[p] : \eta' \mid \mathfrak{s}[q] : \eta'' \mid \mathfrak{s} : \{q, \mathfrak{crssh} \ F\} \cdot \{p, \mathfrak{crssh} \ F\} \}$$

It is possible that η or η have handlers for F. Note that once a queue contains ψ, *<sup>p</sup>*(κ,∅), all non-failed process in the try-handle (κ, <sup>∅</sup>) have sent done notifications to ψ (i.e. applied rule (**SndDone**)). The coordinator which will be introduced shortly ensures this.

$$\begin{array}{c} \tilde{p} = \text{role}(G) \mid F' \quad F' = F \cup \{p\} \quad m = \{\tilde{p}, \text{crsh } \{p\}\} \\ \hline G : (F, d) \bullet N \mid \mathsf{s} : \{\psi, \text{crsh } \{p\}\} \cdot h \to G : (F', d) \bullet N \mid \mathsf{s} : h \cdot m \text{ } (\mathsf{F}) \\ \end{array} \text{ (F)}$$

$$\begin{array}{c} d' = d \cdot \langle p, \psi \rangle^{\phi} \\ \hline G : (F, d) \bullet \mathsf{s} : \langle p, \psi \rangle^{\phi} \cdot h \to G : (F, d') \bullet \mathsf{s} : h \end{array} \text{ (\textbf{CollectDone})}$$

$$\begin{array}{c} roles(d, \phi) \supseteq roles(G, \phi) \; \mid F \quad \forall F' \in \mathit{hld}(G, \phi). (F' \nsubseteq F) \\ \hline G : (F, d) \bullet \mathsf{s} : h \to G : (F, remove(d, \phi)) \bullet \mathsf{s} : h \cdot \langle \psi, roles(G, \phi) \; \mid F \rangle^{\phi} \end{array} \text{(\textbf{fuseDone})}$$

**Fig. 8.** Operational semantics for the coordinator.

Rule (**Cln**) removes a normal message from the queue if the label in the message does not exist in the target process, which can happen when a failure handler was triggered. The function *labels*(η) returns all labels of receiving actions in η which are able to receive messages now or possible later. This removal based on the syntactic process is safe because in a global type separate branch types *not* defined in the same default block or handler body must have disjoint sets of labels (c.f., Sect. 3). Let <sup>φ</sup> <sup>∈</sup> <sup>P</sup> if try-handle <sup>φ</sup> appears inside <sup>P</sup>. Rule (**ClnDone**) removes a done notification of φ from the queue if no try-handle φ exists, which can happen in case of nesting when a handler of an outer try-handle is triggered.

*Handling at Coordinator.* Figure 8 defines the semantics of the coordinator. We firstly give the auxiliary definition of *roles*(G) which gives the set of *all* roles appearing in G.

In rule (**F**), F represents the failures that the coordinator is aware of. This rule states that the coordinator collects and removes a failure notification [ψ, crash *<sup>p</sup>*] heading to it, retains this notification by G : (F , d), <sup>F</sup> <sup>=</sup> <sup>F</sup> ∪ {*p*}, and issues failure notifications to all non-failed participants.

Rules (**CollectDone, IssueDone**), in short inform all participants in φ = (κ, F) to finish their try-handle φ if the coordinator has received sufficient done notifications of φ and did not send out failure notifications that interrupt the task (κ, F) (e.g. see (III)). Rule (**CollectDone**) collects done notifications, i.e., *<sup>p</sup>*, ψ<sup>φ</sup>, from the queue and retains these notification; they are used in (**IssueDone**). For introducing (**IssueDone**), we first introduce *hdl*(G,(κ, F)) to return a set of handler signatures which can be triggered with respect to the current handler:

**Definition 3.** *hdl*(G,(κ, F)) = *dom*(*<sup>H</sup>* ) \ P(F) if <sup>t</sup>(G0)h(*<sup>H</sup>* )<sup>κ</sup> <sup>∈</sup> <sup>G</sup> where <sup>P</sup>(F) represents a powerset of F.

Also, we abuse the function *roles* to collect the non-coordinator roles of φ in d, written *roles*(d, φ); similarly, we write *roles*(G, φ) where φ = (κ, F) to collect the roles appearing in the handler body F in the try-handle of κ in G. Remember that d only contains done notifications sent by participants.

Rule (**IssueDone**) is applied for some <sup>φ</sup> when conditions <sup>∀</sup>F <sup>∈</sup> *hdl*(G, φ).(F ⊆ <sup>F</sup>) and *roles*(d, φ) <sup>⊇</sup> *roles*(G, φ) \ <sup>F</sup> are both satisfied, where <sup>F</sup> contains all failures the coordinator is aware of. Intuitively, these two conditions ensure that (1) the coordinator only issues done notifications to the participants in the try-handle φ if it did not send failure notifications which will trigger a handler of the try-handle φ; (2) the coordinator has received all done notifications from all non-failed participants of φ. We further explain both conditions in the following examples, starting from condition <sup>∀</sup>F <sup>∈</sup> *hdl*(G, φ).(F ⊆ <sup>F</sup>), which ensures no handler in φ can be triggered based on the failure notifications F sent out by the coordinator.

*Example 4.* Assume a process playing role *p*<sup>i</sup> is P<sup>i</sup> = s[*p*i] : t(ηi)h(Hi)φ*<sup>i</sup>* . Where <sup>i</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>} and <sup>H</sup><sup>i</sup> <sup>=</sup> {*p*2} : <sup>η</sup><sup>i</sup>2, {*p*3} : <sup>η</sup><sup>i</sup>3, {*p*2, *<sup>p</sup>*3} : <sup>η</sup><sup>i</sup><sup>23</sup> and the coordinator is <sup>G</sup> : ({*p*2, *<sup>p</sup>*3}, d) where <sup>t</sup>(...)h(*<sup>H</sup>* )<sup>κ</sup> <sup>∈</sup> <sup>G</sup> and *dom*(*<sup>H</sup>* ) = *dom*(Hi) for any <sup>i</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>} and <sup>d</sup> <sup>=</sup> *<sup>p</sup><sup>1</sup>* , ψ(κ,{*p*2}) · *p<sup>1</sup>* , ψ(κ,{*p*2,*p*3}) · <sup>d</sup> . For any φ in d, the coordinator checks if it has issued any failure notification that can possibly trigger a new handler of φ:

1. For <sup>φ</sup> = (κ, {*p*2}) the coordinator issued failure notifications that can interrupt a handler since

$$hld(G, (\kappa, \{p\_2\})) = dom(H) \backslash \mathcal{P}(\{p\_2\}) = \{\{p\_3\}, \{p\_2, p\_3\}\}$$

and {*p*2, *<sup>p</sup>*3}⊆{*p*2, *<sup>p</sup>*3}. That means the failure notifications issued by the coordinator, i.e., {*p*2, *<sup>p</sup>*3}, can trigger the handler with signature {*p*2, *<sup>p</sup>*3}. Thus the coordinator will not issue done notifications for <sup>φ</sup> = (κ, {*p*2}). A similar case is visualized in Fig. 4 at time (2).

2. For <sup>φ</sup> = (κ, {*p*2, *<sup>p</sup>*3}) the coordinator did not issue failure notifications that can interrupt a handler since

$$hld(G, (\kappa, \{p\_2, p\_3\})) = dom(H) \backslash \mathcal{P}(\{p\_2, p\_3\}) = \emptyset$$

so that <sup>∀</sup>F <sup>∈</sup> *hdl*(G,(κ, {*p*2, *<sup>p</sup>*3})).(F ⊆ {*p*2, *<sup>p</sup>*3}) is true. The coordinator will issue done notifications for <sup>φ</sup> = (κ, {*p*2, *<sup>p</sup>*3}).

Another condition *roles*(d, φ) <sup>⊇</sup> *roles*(G, φ) \ <sup>F</sup> states that only when the coordinator sees sufficient done notifications (in d) for φ, it issues done notifications to *all* non-failed participants in <sup>φ</sup>, i.e., ψ, *roles*(*G*, φ) \ *<sup>F</sup>* <sup>φ</sup>. Recall that *roles*(d, φ) returns all roles which have sent a done notification for the handling of φ and *roles*(G, φ) returns all roles involving in the handling of φ. Intuitively one might expect the condition to be *roles*(d, φ) = *roles*(G, φ); the following example shows why this would be wrong.

*Example 5.* Consider a process <sup>P</sup> acting on channel <sup>s</sup>[*p*] and {*q*} ∈ *dom*(H):

$$P = \mathfrak{s}[p]: \mathfrak{t}(\ldots \mathfrak{t}(\ldots) \mathfrak{h}(\{q\}: \eta, \mathsf{H}')^{\phi'}. \eta') \mathfrak{h}(\mathsf{H})^{\phi'}$$

Assume P has already reduced to:

$$P = \mathbf{s}[p] : \mathbf{t}(\underline{0})\mathbf{h}(\mathsf{H})^{\phi}$$

We show why *roles*(d, φ) <sup>⊇</sup> *roles*(G, φ) \<sup>F</sup> is necessary. We start with the simple cases and then move to the more involving ones.

**Fig. 9.** The grammar of local types.


Thus rule (**IssueDone**) has the condition *roles*(d, φ) <sup>⊇</sup> *roles*(G, φ) \ <sup>F</sup> because of cases like (b) and (c).

The interplay between issuing of done notification (**IssueDone**) and issuing of failure notifications (**F**) is non-trivial. The following proposition clarifies that the participants in the same try-handle φ will never get confused with handling failures or completing the try-handle φ.

**Proposition 1.** *Given <sup>s</sup>* : <sup>h</sup> *with* <sup>h</sup> <sup>=</sup> <sup>h</sup> · ψ, *<sup>p</sup>* <sup>φ</sup> · <sup>h</sup> *and Fset*(h, *<sup>p</sup>*) <sup>=</sup> <sup>∅</sup>*, the rule (* **TryHdl***) is not applicable for the try-handle* φ *at the process playing role p.*

### **5 Local Types**

Figure 9 defines local types for typing behaviors of endpoint processes with failure handling. Type p! is the primitive for a sending type, and p? is the primitive for a receiving type, derived from global type <sup>p</sup> <sup>→</sup> <sup>q</sup>{li(Si).G<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> by projection. Others correspond straightforwardly to process terms. Note that type end only appears in *runtime* type checking. Below we define G*p* to project a global type G on *p*, thus generating *p*'s local type. 

**Definition 4 (Projection).** Consider a well-formed top-level global type [˜*q*]G. Then G*p* is defined as follows: ⎧⎨

$$\begin{cases} (1) \ G|p \text{ where } G = \mathfrak{t}(G\_0)\mathfrak{h}(F\_1; G\_1, \dots, F\_n; G\_n)^\kappa. G^\prime = \\ \mathfrak{t}\{G\_0|p\}\mathfrak{h}(F\_1; G\_1\|p, \dots, F\_n; G\_n\|p)^{(\kappa, \emptyset)}. G^\prime\!/ p & \text{if } p \in roles(G) \\ G^\prime\!/ p & \text{otherwise} \end{cases}$$

$$(2) \ p\_1 \to p\_2\{l\_i(S\_i), G\_i\}\_{i \in I}\mathbb{I}\begin{cases} p\_2!\{l\_i(S\_i), G\_i\|p\}\_{i \in I} & \text{if } p = p\_1 \\ p\_1!\{l\_i(S\_i), G\_i\|p\}\_{i \in I} & \text{if } p = p\_2 \\ G\_1\!\!/ p & \text{if } \forall i, j \in I. G\_i\!\!/ p = G\_j\!\!/ p \end{cases}$$

(3) (μ*t*.G)*p* = μ*t*.(G*<sup>p</sup>*) if ∃t(G )h(*<sup>H</sup>* ) <sup>∈</sup> <sup>G</sup> and <sup>G</sup>*p* = *t* for any *t* (4) *tp* = *t* (5) end*p* = end

Otherwise it is undefined.

The main rule is (1): if *p* appears somewhere in the target try-handle global type then the endpoint type has a try-handle annotated with κ and the default logic (i.e., <sup>F</sup> <sup>=</sup> <sup>∅</sup>). Note that even if <sup>G</sup>0*p* = end the endpoint still gets such a try-handle because it needs to be ready for (possible) failure handling; if *p* does not appear anywhere in the target try-handle global type, then the projection skips to the continuation.

Rule (2) produces local types for interaction endpoints. If the endpoint is a sender (i.e., *p* = *p*1), then its local type abstracts that it will send something from one of the possible internal choices defined in {li(Si)}<sup>i</sup>∈<sup>I</sup> to *<sup>p</sup>*2, then continue as Gk*<sup>p</sup>*, gained from the projection, if <sup>k</sup> <sup>∈</sup> <sup>I</sup> is chosen. If the endpoint is a receiver (i.e., *p* = *p*2), then its local type abstracts that it will receive something from one of the possible external choices defined in {li(Si)}<sup>i</sup>∈<sup>I</sup> sent by *<sup>p</sup>*1; the rest is similarly as for the sender. However, if *p* is not in this interaction, then its local type starts from the next interaction which *p* is in; moreover, because *p* does not know what choice that *p*<sup>1</sup> has made, every path Gi*p* lead by branch l<sup>i</sup> shall be the same for *p* to ensure that interactions are consistent. For example, in <sup>G</sup> <sup>=</sup> *<sup>p</sup>*<sup>1</sup> <sup>→</sup> *<sup>p</sup>*2{l1(S1).*p*<sup>3</sup> <sup>→</sup> *<sup>p</sup>*<sup>1</sup> <sup>l</sup>3(S), l2(S2).*p*<sup>3</sup> <sup>→</sup> *<sup>p</sup>*<sup>1</sup> <sup>l</sup>4(S)}, interaction *<sup>p</sup>*<sup>3</sup> <sup>→</sup> *<sup>p</sup>*<sup>1</sup> continues after *<sup>p</sup>*<sup>1</sup> <sup>→</sup> *<sup>p</sup>*<sup>2</sup> takes place. If <sup>l</sup><sup>3</sup> <sup>=</sup> <sup>l</sup>4, then <sup>G</sup> is not projectable for *<sup>p</sup>*<sup>3</sup> because *p*<sup>3</sup> does not know which branch that *p*<sup>1</sup> has chosen; if *p*<sup>1</sup> chose branch l1, but *p*<sup>3</sup> (blindly) sends out label l<sup>4</sup> to *p*1, for *p*<sup>1</sup> it is a mistake (but it is not a mistake for *p*3) because *p*<sup>1</sup> is expecting to receive label l3. To prevent such inconsistencies, we adopt the projection algorithm proposed in [24]. Other session type works [17,39] provide ways to weaken the classical restriction on projection of branching which we use.

Rule (3) forbids a try-handle to appear in a recursive body, e.g., μ*t*. t(G)h(F : *t*)<sup>κ</sup>.G is not allowed, but t(μ*t*.G)h(*H* )<sup>κ</sup> and t(G)h(F : μ*t*.G , *H* )<sup>κ</sup> are allowed. This is because κ is used to avoid confusion of messages from different try-handles. If a recursive body contains a try-handle, we have to dynamically generate different levels to maintain interaction consistency, so static type checking does not suffice. We are investigating alternative runtime checking mechanisms, but this is beyond the scope of this paper. Other rules are straightforward.

*Example 6.* Recall the global type G from Fig. 2 in Sect. 1. Applying projection rules defined in Definition 4 to G on every role in G we obtain the following:

$$\begin{cases} T\_{dfs} = G|\,dfs = \mathfrak{t}\left(\mu t.w\_{1}!l\_{d1}(S).w\_{2}!l\_{d2}(S).w\_{1}?l\_{r1}(S').w\_{2}?l\_{r2}(S').t\right)\mathfrak{h}(\mathcal{H}\_{dfs})^{(1,\emptyset)}\\ \mathcal{H}\_{dfs} = \{w\_{1}\}:\mu t'.w\_{2}?l\_{d\_{1}1}'(S).w\_{2}?l\_{r1}'(S').t',\\ \{w\_{2}\}:\mu t''.w\_{1}?l\_{d\_{2}}'(S).w\_{1}?l\_{r\_{2}}'(S').t'',\{w\_{1},w\_{2}\}:\mathsf{end}\\ T\_{w\_{1}} = G|w\_{1} = \mathfrak{t}\{\mu t.dfs?l\_{d\_{1}}(S).dfs!l\_{r\_{1}}(S').t\}\mathfrak{h}(\mathcal{H}\_{w\_{1}})^{(1,\emptyset)}\\ \mathcal{H}\_{w\_{1}} = \{w\_{1}\}:\mathsf{end},\{w\_{2}\}:\mu t'.dfs?l\_{r\_{2}}'(S).dfs!l\_{r\_{2}}'(S').t',\{w\_{1},w\_{2}\}:\mathsf{end}\\ T\_{w\_{2}} = G|w\_{2} = \mathfrak{t}\{\mu t.dfs?l\_{d\_{2}}(S).dfs!l\_{r\_{2}}(S').t\}\mathfrak{h}(\mathcal{H}\_{w\_{2}})^{(1,\emptyset)}\\ \mathcal{H}\_{w\_{2}} = \{w\_{1}\}:\mu t''.dfs?l\_{d\_{1}}'(S).dfs!l\_{r\_{1}}'(S).t'',\{w\_{2}\}:\mathsf{end},\{w\_{1},w\_{2}\}:\mathsf{end}\end{cases}$$

$$\begin{array}{c} \forall F \in dom(\mathsf{H}). \; F \vdash \mathsf{c}: \mathsf{H}(F) \rhd \; \{\mathsf{c}: \mathsf{H}(F)\} \\\hline \; F \vdash \mathsf{c}: \mathsf{t}(\eta)\mathsf{h}(\mathsf{H})^{\phi}. \eta' \rhd \; \{\mathsf{c}: \mathsf{t}(T)\mathsf{h}(\mathcal{H})^{\phi}. T'\} \end{array} \; \middle| \; \mathsf{T}\mathsf{-th}\; \|F\| $$

**Fig. 10.** Typing rules for processes

### **6 Type System**

Next we introduce our type system for typing processes. Figures 10 and 11 present typing rules for endpoints processes, and typing judgments for applications and systems respectively.

We define shared environments Γ to keep information on variables and the coordinator, and session environments Δ to keep information on endpoint types:

$$\begin{array}{llll} \Gamma ::= \emptyset \mid \Gamma, X:S \ T \mid \Gamma, x:S \mid \Gamma, a:G \mid \Gamma, \Psi & \Delta ::= \emptyset \mid \Delta, \mathbf{c}:T \mid \Delta, \mathbf{s}:\text{h} \\ \mathsf{m} ::= \langle p,q,l(S)\rangle \mid \langle p, \mathsf{crsh} \, F \rangle \mid \langle p,q\rangle^{\phi} & \mathsf{h} ::= \emptyset \mid \mathbf{h} \cdot \mathsf{m} \end{array}$$

Γ maps process variables X and content variables x to their types, shared names a to global types G, and a coordinator *Ψ* = G : (F, d) to failures and done notifications it has observed. Δ maps session channels c to local types and session queues to queue types. We write Γ, Γ <sup>=</sup> <sup>Γ</sup>∪Γ when *dom*(Γ)∩*dom*(Γ ) = ∅; same for Δ, Δ . Queue types h are composed of message types m. Their permutation is defined analogously to the permutation for messages. The typing judgment for local processes <sup>Γ</sup> <sup>P</sup> <sup>Δ</sup> states that process <sup>P</sup> is well-typed by <sup>Δ</sup> under <sup>Γ</sup>.

Since we do not define sequential composition for processes, our type system implicitly forbids session interleaving by T-ini. This is different from other session type works [15,24], where session interleaving is prohibited for the progress property; here the restriction is inherent to the type system.

Figure <sup>10</sup> lists our typing rules for endpoint processes. Rule T-ini says that if a process's set of actions is well-typed by G*p* on some c, this process can play role *p* in a, which claims to have interactions obeying behaviors defined in <sup>G</sup>. <sup>G</sup> means that <sup>G</sup> is closed, i.e., devoid of type variables. This rule forbids

**Fig. 11.** Typing rules for applications and systems.

<sup>a</sup>[*p*].b[*q*].P because a process can only use one session channel. Rule T-snd states that an action for sending is well-typed to a sending type if the label and the type of the content are expected; T-rcv states that an action for branching (i.e., for receiving) is well-typed to a branching type if all labels and the types of contents are as expected. Their follow-up actions shall also be well-typed. Rule T-0 types an idle process. Predicate end-only <sup>Δ</sup> is defined as stating whether all endpoints in Δ have type end:

**Definition 5 (End-only** <sup>Δ</sup>**).** We say <sup>Δ</sup> is end-only if and only if <sup>∀</sup>s[*p*] <sup>∈</sup> *dom*(Δ), Δ(s[*p*]) = end.

Rule T-yd types yielding actions, which only appear at runtime. Rule T-if is standard in the sense that the process is well-typed by Δ if e has boolean type and its sub-processes (i.e., <sup>η</sup><sup>1</sup> and <sup>η</sup>2) are well-typed by <sup>Δ</sup>. Rules T-var,T-def are based on a recent summary of MPSTs [14]. Note that T-def forbids the type <sup>μ</sup>*t*.*t*. Rule T-th states that a try-handle is well-typed if it is annotated with the expected level <sup>φ</sup>, its default statement is well-typed, <sup>H</sup> and <sup>H</sup> have the same handler signatures, and all handling actions are well-typed.

Figure <sup>11</sup> shows typing rules for applications and systems. Rule T-∅ types an empty queue. Rules T-m,T-D,T-F simply type messages based on their shapes. Rule T-pa says two applications composed in parallel are well-typed if they do not share any session channel. Rule T-s says a part of a system <sup>S</sup> can start a private session, say <sup>s</sup>, if <sup>S</sup> is well-typed according to a <sup>Γ</sup> <sup>Δ</sup><sup>s</sup> that is *coherent* (defined shortly). The system (νs)<sup>S</sup> with a part becoming private in <sup>s</sup> is well-typed to <sup>Δ</sup> \ <sup>Δ</sup>s, that is, <sup>Δ</sup> after removing <sup>Δ</sup>s.

### **Definition 6 (A Session Environment Having** s **Only:** Δs**)**

$$\Delta\_{\mathbf{s}} = \{ \mathbf{s}[p] : T \mid \mathbf{s}[p] \in dom(\Delta) \} \cup \{ \mathbf{s} : \mathbf{h} \mid \mathbf{s} \in dom(\Delta) \}.$$

Rule T-sys says that a system *<sup>Ψ</sup>* - N is well-typed if application N is welltyped and there exists a coordinator *Ψ* for handling this application. We say <sup>Γ</sup> <sup>Δ</sup> is coherent under <sup>Γ</sup> if the local types of all endpoints are dual to each other after their local types are updated because of messages or notifications in <sup>s</sup> : h.

*Coherence.* We say that a session environment is *c*oherent if, at any time, given a session with its latest messages and notifications, every endpoint participating in it is able to find someone to interact with (i.e., its dual party exists) right now or afterwards.

*Example 7.* Continuing with Example <sup>6</sup> – the session environment <sup>Γ</sup> <sup>Δ</sup> is coherent even if *w<sup>2</sup>* will not receive any message from *dfs* at this point. The only possible action to take in Δ is that *dfs* sends out a message to *w<sup>1</sup>* . When this action fires, Δ is reduced to Δ under a coordinator. (The reduction relation <sup>Γ</sup> <sup>Δ</sup> <sup>→</sup><sup>T</sup> <sup>Γ</sup> <sup>Δ</sup> , where Γ = Γ0, *Ψ* and Γ = Γ0, *Ψ* , is defined based on the rules of operational semantics of applications in Sect. 4, Figs. 6 and 7). In Δ , which abstracts the environment when *dfs* sends a message to *w<sup>1</sup>* , *w<sup>2</sup>* will be able to receive this message.

$$\begin{array}{ll} \Delta = \mathsf{s}[dfs]: T\_{dfs}, \ \mathsf{s}[w\_{1}]: T\_{w\_{1}}, \ \mathsf{s}[w\_{2}]: T\_{w\_{2}}, \mathsf{s}: \emptyset\\ \Delta' = \mathsf{s}[dfs]: \mathsf{t}(w\_{2}!l\_{d\_{2}}(S).w\_{1}?l\_{r\_{1}}(S').w\_{2}?l\_{r\_{2}}(S').T)\mathsf{h}(\mathcal{H})^{(1,\emptyset)},\\ \mathsf{s}[w\_{1}]: T\_{w\_{1}}, \ \mathsf{s}[w\_{2}]: T\_{w\_{2}}, \mathsf{s}: \langle dfs, w\_{1}, l\_{d\_{1}}(S) \rangle\\ \text{where } T = \mu t.w\_{1}?l\_{d\_{1}}(S).w\_{2}?l\_{d\_{2}}(S).w\_{1}?l\_{r\_{1}}(S').w\_{2}?l\_{r\_{2}}(S').t \end{array}$$

We write s[*p*] : T  s[*q*] : T to state that actions of the two types are *dual*:

**Definition 7 (Duality).** We define s[*p*] : T  s[*q*] : T as follows:

s[*p*] : end  s[*q*] : end s[*p*] : end  s[*q*] : end s[*p*] : end  s[*q*] : end <sup>s</sup>[*p*] : end  <sup>s</sup>[*q*] : end s[*p*] : *<sup>t</sup>* <sup>s</sup>[*q*] : *<sup>t</sup>* <sup>s</sup>[*p*] : <sup>T</sup> <sup>s</sup>[*q*] : <sup>T</sup> s[*p*] : μ*t*.T  s[*q*] : μ*t*.T <sup>∀</sup><sup>i</sup> <sup>∈</sup> I. <sup>s</sup>[*p*] : <sup>T</sup><sup>i</sup> <sup>s</sup>[*q*] : <sup>T</sup> i <sup>s</sup>[*p*] : <sup>q</sup>! {li(Si).T<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> <sup>s</sup>[*q*] : <sup>p</sup>? {li(Si).T <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> s[*p*] : T<sup>1</sup>  s[*q*] : T<sup>2</sup> s[*p*] : T <sup>1</sup>  s[*q*] : T <sup>2</sup> *dom*(H1) = *dom*(H2) <sup>∀</sup><sup>F</sup> <sup>∈</sup> *dom*(H1). <sup>s</sup>[*p*] : <sup>H</sup>1(F)  <sup>s</sup>[*q*] : <sup>H</sup>2(F) <sup>s</sup>[*p*] : <sup>t</sup>(T1)h(H1) <sup>φ</sup>.T <sup>1</sup> <sup>s</sup>[*q*] : <sup>t</sup>(T2)h(H2) <sup>φ</sup>.T 2

Operation T *p* is to filter T to get the partial type which only contains actions of *p*. For example, p1!l (S ).p2!l(S) *<sup>p</sup>*<sup>2</sup> <sup>=</sup> <sup>p</sup>2!l(S) and <sup>p</sup>1!{T1, T2} *p*<sup>2</sup> = p2?l(S) where T<sup>1</sup> = l1(S1).p2?l(S) and T<sup>2</sup> = l2(S2).p2?l(S). Next we define (h)*<sup>p</sup>*→*<sup>q</sup>* to filter <sup>h</sup> to generate (1) the normal message types sent from *p* heading to *q*, and (2) the notifications heading to *q*. For example (*<sup>p</sup>*, *<sup>q</sup>*, l(S)·[*q*, crash <sup>F</sup>]·ψ, *<sup>q</sup>* <sup>φ</sup> · [*p*, crash <sup>F</sup>])*<sup>p</sup>*→*<sup>q</sup>* <sup>=</sup> *<sup>p</sup>*?l(S) · [<sup>F</sup> ]·<sup>ψ</sup> <sup>φ</sup>. The message types are abbreviated to contain only necessary information.

We define <sup>T</sup>−ht to mean the effect of ht on <sup>T</sup>. Its concept is similar to the *session remainder* defined in [35], which returns new local types of participants after participants consume messages from the global queue. Since failure notifications will not be consumed in our system, and we only have to observe the change of a participant's type after receiving or being triggered by some message types in ht, we say that <sup>T</sup>−ht represents the effect of ht on <sup>T</sup>. The behaviors follows our operational semantics of applications and systems defined in Figs. 6, 7, and 8. For example <sup>t</sup>(*q*?{li(Si).Ti}i∈<sup>I</sup> )h(H)φ.T <sup>−</sup>*q*?lk(Sk)·ht <sup>=</sup> <sup>t</sup>(Tk)h(H)φ.T <sup>−</sup>ht where <sup>k</sup> <sup>∈</sup> <sup>I</sup>.

Now we define what it means for Δ to be coherent under Γ:

### **Definition 8 (Coherence).** <sup>Γ</sup> <sup>Δ</sup> coherent if the following conditions hold:


In condition 1, we require a coordinator for every session so that when a failure occurs, the coordinator can announce failure notifications to ask participants to handle the failure. Condition 2 requires that, for any two endpoints, say s[*p*] and <sup>s</sup>[*q*], in <sup>Δ</sup>, equation <sup>s</sup>[*p*] : <sup>T</sup> *<sup>q</sup>*−(h)*<sup>q</sup>*→*<sup>p</sup>* <sup>s</sup>[*q*] : <sup>T</sup> *<sup>p</sup>*−(h)*<sup>p</sup>*→*<sup>q</sup>* , must hold. This condition asserts that interactions of non-failed endpoints are dual to each other after the effect of h; while failed endpoints are removed from <sup>Δ</sup>, thus the condition is satisfied immediately.

### **7 Properties**

We show that our type system ensures properties of subject congruence, subject reduction, and progress. All auxiliary definitions and proofs are in the long version [43].

The property of subject congruence states that if S (a system containing an application and a coordinator) is well-typed by some session environment, then a S that is structurally congruent to it is also well-typed by the same session environment:

**Theorem 1 (Subject Congruence).** <sup>Γ</sup> S <sup>Δ</sup> and S≡S imply <sup>Γ</sup> <sup>S</sup> Δ.

Subject reduction states that a well-typed S (coherent session environment respectively) is always well-typed (coherent respectively) after reduction:

### **Theorem 2 (Subject Reduction)**

– <sup>Γ</sup> S <sup>Δ</sup> with <sup>Γ</sup> <sup>Δ</sup> coherent and S →<sup>∗</sup> <sup>S</sup> imply that <sup>∃</sup>Δ , Γ such that <sup>Γ</sup> S <sup>Δ</sup> and <sup>Γ</sup> <sup>Δ</sup> <sup>→</sup><sup>∗</sup> <sup>T</sup> <sup>Γ</sup> <sup>Δ</sup> or <sup>Δ</sup> <sup>≡</sup> <sup>Δ</sup> and <sup>Γ</sup> <sup>Δ</sup> coherent. – <sup>Γ</sup> S <sup>∅</sup> and S →<sup>∗</sup> <sup>S</sup> imply that <sup>Γ</sup> S <sup>∅</sup> for some <sup>Γ</sup> .

We allow sessions to run in parallel at the top level, e.g., <sup>S</sup> = (νs1)(*Ψ*<sup>1</sup> - <sup>N</sup>1) <sup>|</sup> ... <sup>|</sup> (νsn)(*Ψ*<sup>n</sup> - <sup>N</sup>n). Assume we have <sup>S</sup> with <sup>a</sup>[*p*].P ∈ S. If we cannot apply rule (**Link**), S cannot reduce. To prevent this kind of situation, we require S to be *initializable* such that, <sup>∀</sup>a[*p*].P ∈ S, (**Link**) is applicable.

The following property states that S never gets stuck (property of progress):

**Theorem 3 (Progress).** If <sup>Γ</sup> S <sup>∅</sup> and <sup>S</sup> is initializable, then either S →<sup>∗</sup> S and S is initializable or S = *Ψ* <sup>s</sup> : <sup>h</sup> <sup>|</sup> ... <sup>|</sup> *<sup>Ψ</sup>* s : h and h, ..., h only contain failure notifications sent by coordinators and messages heading to failed participants.

After all processes in S terminate, failure notifications sent by coordinators are left; thus the final system can be of the form *Ψ* <sup>s</sup> : <sup>h</sup> <sup>|</sup> ... <sup>|</sup> *<sup>Ψ</sup>* s : h , where h, ..., h have failure notifications sent by coordinators and thus reduction rules (**CollectDone**),(**IssueDone**), and (**F**) will not be applied.

*Minimality.* The following proposition points out that, when all roles defined in a global type, say G, are robust, then the application obeying to G will never have interaction with a coordinator (i.e., interactions of the application are equivalent to those without a coordinator). This is an important property, as it states that our model does not incur coordination overhead when all participants are robust, or in failure-agnostic contexts as considered in previous MPST works.

**Proposition 2.** Assume <sup>∀</sup>*<sup>p</sup>* <sup>∈</sup> *roles*(G) = {*p*1, ..., *<sup>p</sup>*<sup>n</sup>}, *<sup>p</sup>* is robust and <sup>P</sup><sup>i</sup> <sup>=</sup> <sup>s</sup>[*p*i] : <sup>η</sup><sup>i</sup> for <sup>i</sup> ∈ {1..n} and <sup>S</sup> = (<sup>ν</sup> <sup>s</sup>)(*<sup>Ψ</sup>* - <sup>P</sup>1|...|P<sup>n</sup>|<sup>s</sup> : <sup>h</sup>) where <sup>P</sup>i, i ∈ {1..n} contains no try-handle. Then we have <sup>Γ</sup> S <sup>∅</sup> and whenever S →<sup>∗</sup> <sup>S</sup> we have *Ψ* ∈ S , *<sup>Ψ</sup>* <sup>=</sup> <sup>G</sup> : (∅, <sup>∅</sup>).

*Proof.* Immediately by typing rules T-ini,T-s,T-sys, Definition <sup>4</sup> (Projection), and the operational semantics defined in Figs. 6, 7, and 8.

### **8 Related Work**

Several session type works study exception handling [7,9,16,30]. However, to the best of our knowledge this is the first theoretical work to develop a formalism and typing discipline for the coordinator-based model of *crash failure* handling in practical asynchronous distributed systems.

Structured interactional exceptions [7] study exception handling for binary sessions. The work extends session types with a *try-catch* construct and a *throw* instruction, allowing participants to raise runtime exceptions. Global escape [6] extends previous works on exception handling in binary session types to MPSTs. It supports nesting and sequencing of try-catch blocks with restrictions. Reduction rules for exception handling are of the form <sup>Σ</sup> <sup>P</sup> <sup>→</sup> <sup>Σ</sup> <sup>P</sup> , where Σ is the *exception environment*. This central environment at the core of the semantics is updated synchronously and atomically. Furthermore, the reduction of a trycatch block to its continuation is done in a synchronous reduction step involving all participants in a block. Lastly this work can only handle exceptions, i.e., explicitly raised application-level failures. These do not affect communication channels [6], unlike participant crashes.

Similarly, our previous work [13] only deals with exceptions. An interaction <sup>p</sup> <sup>→</sup> <sup>q</sup> : <sup>S</sup> <sup>∨</sup> <sup>F</sup> defines that <sup>p</sup> can send a message of type <sup>S</sup> to <sup>q</sup>. If <sup>F</sup> is not empty then instead of sending a message p can throw F. If a failure is thrown only participants that have casual dependencies to that failure are involved in the failure handling. No concurrent failures are allowed therefore all interactions which can raise failures are executed in a lock step fashion. As a consequence, the model can not be used to deal with crash failures.

Adameit et al. [1] propose session types for link failures, which extend session types with an optional block which surrounds a process and contains default values. The default values are used if a link failure occurs. In contrast to our work, the communication model is overall synchronous whereas our model is asynchronous; the optional block returns default values in case of a failure but it is still the task of the developer to do something useful with it.

Demangeon et al. study interrupts in MPSTs [16]. This work introduces an interruptible block {|G|}<sup>c</sup><sup>l</sup> by <sup>r</sup>; <sup>G</sup> identified by <sup>c</sup>; here the protocol <sup>G</sup> can be interrupted by a message <sup>l</sup> from r and is continued by <sup>G</sup> after either a normal or an interrupted completion of G. Interrupts are more a control flow instruction like exceptions than an actual failure handling construct, and the semantics can not model participant crashes.

Neykova and Yoshida [36] show that MPSTs can be used to calculate safe global states for a safe recovery in Erlang's *let it crash* model [2]. That work is well suited for recovery of lightweight processes in an actor setting. However, while it allows for elaborate failure handling by connecting (endpoint) processes with runtime monitors, the model does not address the fault tolerance of runtime monitors themselves. As monitors can be interacting in complex manners replication does not seem straightforwardly applicable, at least not without potentially hampering performance (just as with *straightforward* replication of entire applications).

Failure handling is studied in several process calculi and communicationcentered programming languages without typing discipline. The conversation calculus [42] models exception behavior in abstract service-based systems with message-passing based communication. The work does not use channel types but studies the behavioral theory of bisimilarity. Error recovery is also studied in a concurrent object setting [45]; interacting objects are grouped into coordinated atomic actions (CAs) which enable safe error recovery. CAs can however not be nested. PSYNC [18] is a domain specific language based on the *heard-of* model of distributed computing [12]. Programs written in PSYNC are structured into rounds which are executed in a lock step manner. PSYNC comes with a state-based verification engine which enables checking of safety and liveness properties; for that programmers have to define non-trivial inductive invariants and ranking functions. In contrast to the coordinator model, the heard-of model is not widely deployed in practice. Verdi [44] is a framework for implementing and verifying distributed systems in Coq. It provides the possibility to verify the system against different network models. Verdi enables the verification of properties in an idealized fault model and then transfers the guarantees to more realistic fault models by applying transformation functions. Verdi supports safety properties but no liveness properties.

### **9 Final Remarks**

*Implementation.* Based on our presented calculus we developed a domain-specific language and corresponding runtime system in Scala, using ZooKeeper as the coordinator. Specifically our implementation provides mechanisms for (1) interacting with ZooKeeper as coordinator, (2) done and failure notification delivery and routing, (3) practical failure detection and dealing with false suspicions and (4) automatically inferring try-handle levels.

*Conclusions.* This work introduces a formal model of verified crash failure handling featuring a lightweight coordinator as common in many real-life systems. The model carefully exposes potential problems that may arise in distributed applications due to partial failures, such as inconsistent endpoint behaviors and orphan messages. Our typing discipline addresses these challenges by building on the mechanisms of MPSTs, e.g., global type well-formedness for sound failure handling specifications, modeling asynchronous permutations between regular messages and failure notifications in sessions, and the type-directed mechanisms for determining correct and orphaned messages in the event of failure. We adapt coherence of session typing environments (i.e., endpoint consistency) to consider failed roles and orphan messages, and show that our type system statically ensures subject reduction and progress in the presence of failures.

*Future Work.* We plan to expand our implementation and develop further applications. We believe dynamic role participation and role parameterization would be valuable for failure handling. Also, we are investigating options to enable addressing the coordinator as part of the protocol so that pertinent runtime information can be persisted by the coordinator. We plan to add support to our language and calculus for solving various explicit agreement tasks (e.g., consensus, atomic commit) via the coordinator.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **On Polymorphic Sessions and Functions A Tale of Two (Fully Abstract) Encodings**

Bernardo Toninho1,2(B) and Nobuko Yoshida<sup>2</sup>

<sup>1</sup> NOVA-LINCS, Departamento de Inform´atica, FCT, Universidade Nova de Lisboa, Lisbon, Portugal

btoninho@fct.unl.pt

<sup>2</sup> Department of Computing, Imperial College London, London, UK

**Abstract.** This work exploits the logical foundation of session types to determine what kind of type discipline for the π-calculus can exactly capture, and is captured by, λ-calculus behaviours. Leveraging the proof theoretic content of the soundness and completeness of sequent calculus and natural deduction presentations of linear logic, we develop the first *mutually inverse* and *fully abstract* processes-as-functions and functionsas-processes encodings between a polymorphic session π-calculus and a linear formulation of System F. We are then able to derive results of the session calculus from the theory of the λ-calculus: (1) we obtain a characterisation of inductive and coinductive session types via their algebraic representations in System F; and (2) we extend our results to account for *value* and *process* passing, entailing strong normalisation.

### **1 Introduction**

Dating back to Milner's seminal work [29], encodings of λ-calculus into π-calculus are seen as essential benchmarks to examine expressiveness of various extensions of the π-calculus. Milner's original motivation was to demonstrate the power of link mobility by decomposing higher-order computations into pure name passing. Another goal was to analyse functional behaviours in a broad computational universe of concurrency and non-determinism. While *operationally* correct encodings of many higher-order constructs exist, it is challenging to obtain encodings that are precise wrt behavioural equivalence: the semantic distance between the λ-calculus and the π-calculus typically requires either restricting process behaviours [45] (e.g. via typed equivalences [5]) or enriching the λcalculus with constants that allow for a suitable characterisation of the term equivalence induced by the behavioural equivalence on processes [43].

Encodings in π-calculi also gave rise to new typing disciplines: Session types [20,22], a typing system that is able to ensure deadlock-freedom for communication protocols between two or more parties [23], were originally motivated "from process encodings of various data structures in an asynchronous version of the π-calculus" [21]. Recently, a propositions-as-types correspondence between linear logic and session types [8,9,54] has produced several new developments and logically-motivated techniques [7,26,49,54] to augment both the theory and practice of session-based message-passing concurrency. Notably, parametric session polymorphism [7] (in the sense of Reynolds [41]) has been proposed and a corresponding abstraction theorem has been shown.

Our work expands upon the proof theoretic consequences of this propositions-as-types correspondence to address the problem of how to *exactly* match the behaviours induced by session π-calculus encodings of the λ-calculus with those of the λ-calculus. We develop *mutually inverse* and *fully abstract* encodings (up to typed observational congruences) between a polymorphic session-typed π-calculus and the polymorphic λ-calculus. The encodings arise from the proof theoretic content of the equivalence between sequent calculus (i.e. the session calculus) and natural deduction (i.e. the λ-calculus) for *second-order* intuitionistic linear logic, greatly generalising [49]. While fully abstract encodings between λ-calculi and π-calculi have been proposed (e.g. [5,43]), our work is the first to consider a two-way, *both* mutually inverse *and* fully abstract embedding between the two calculi by crucially exploiting the linear logic-based session discipline. This also sheds some definitive light on the nature of concurrency in the (logical) session calculi, which exhibit "don't care" forms of non-determinism (e.g. processes may race on stateless replicated servers) rather than "don't know" non-determinism (which requires less harmonious logical features [2]).

In the spirit of Gentzen [14], we use our encodings as a tool to study nontrivial properties of the session calculus, deriving them from results in the λcalculus: We show the existence of inductive and coinductive sessions in the polymorphic session calculus by considering the representation of initial F-algebras and final F-coalgebras [28] in the polymorphic λ-calculus [1,19] (in a linear setting [6]). By appealing to full abstraction, we are able to derive processes that satisfy the necessary algebraic properties and thus form adequate *uniform* representations of inductive and coinductive session types. The derived algebraic properties enable us to reason about standard data structure examples, providing a logical justification to typed variations of the representations in [30].

We systematically extend our results to a session calculus with λ-term and process passing (the latter being the core calculus of [50], inspired by Benton's LNL [4]). By showing that our encodings naturally adapt to this setting, we prove that it is possible to encode higher-order process passing in the firstorder session calculus fully abstractly, providing a typed and proof-theoretically justified re-envisioning of Sangiorgi's encodings of higher-order π-calculus [46]. In addition, the encoding instantly provides a strong normalisation property of the higher-order session calculus.

Contributions and the outline of our paper are as follows:

*§* **3.1** develops a functions-as-processes encoding of a linear formulation of System F, Linear-F, using a logically motivated polymorphic session πcalculus, Polyπ, and shows that the encoding is operationally sound and complete.

*§* **3.2** develops a processes-as-functions encoding of Poly<sup>π</sup> into Linear-F, arising from the completeness of the sequent calculus wrt natural deduction, also operationally sound and complete.

*§* **3.3** studies the relationship between the two encodings, establishing they are *mutually inverse* and *fully abstract* wrt typed congruence, the first twoway embedding satisfying *both* properties.

*§* **4** develops a *faithful* representation of inductive and coinductive session types in Polyπ via the encoding of initial and final (co)algebras in the polymorphic λ-calculus. We demonstrate a use of these algebraic properties via examples.

*§* **4.2** and **4.3** study term-passing and process-passing session calculi, extending our encodings to provide embeddings into the first-order session calculus. We show full abstraction and mutual inversion results, and derive strong normalisation of the higher-order session calculus from the encoding.

In order to introduce our encodings, we first overview Polyπ, its typing system and behavioural equivalence (*§* **2**). We discuss related work and conclude with future work (*§* **5**). Detailed proofs can be found in [52].

### **2 Polymorphic Session** *π***-Calculus**

This section summarises the polymorphic session π-calculus [7], dubbed Polyπ, arising as a process assignment to second-order linear logic [15], its typing system and behavioural equivalences.

#### **2.1 Processes and Typing**

**Syntax.** Given an infinite set Λ of names x, y, z, u, v, the grammar of processes P, Q, R and session types A, B, C is defined by:

$$\begin{array}{lcl}P,Q,R & ::= x\langle y\rangle.P \mid x\langle y\rangle.P \mid P \mid Q \mid \langle \nu y\rangle P \mid [x \leftrightarrow y] \qquad \mid \mathbf{0} \\ & \mid \ x\langle A\rangle.P \mid x\langle Y\rangle.P \mid x.\mathsf{inl}; P \mid x.\mathsf{inr}; P \mid x.\mathsf{case}(P,Q) \mid !x(y).P \\ A,B & ::= \mathbf{1} \mid A \multimap B \mid A \otimes B \mid A \& B \mid A \oplus B \mid !A \mid \forall X.A \mid \exists X.A \mid X \end{array}$$

x<sup>y</sup>.P denotes the output of channel <sup>y</sup> on <sup>x</sup> with continuation process <sup>P</sup>; <sup>x</sup>(y).P denotes an input along <sup>x</sup>, bound to <sup>y</sup> in <sup>P</sup>; <sup>P</sup> <sup>|</sup> <sup>Q</sup> denotes parallel composition; (νy)P denotes the restriction of name y to the scope of P; **0** denotes the inactive process; [<sup>x</sup> <sup>↔</sup> <sup>y</sup>] denotes the linking of the two channels <sup>x</sup> and <sup>y</sup> (implemented as renaming); <sup>x</sup>-<sup>A</sup>.P and <sup>x</sup>(<sup>Y</sup> ).P denote the sending and receiving of a *type* <sup>A</sup> along x bound to Y in P of the receiver process; x.inl; P and x.inr; P denote the emission of a selection between the left or right branch of a receiver x.case(P, Q) process; !x(y).P denotes an input-guarded replication, that spawns replicas upon receiving an input along <sup>x</sup>. We often abbreviate (νy)x<sup>y</sup>.P to <sup>x</sup><sup>y</sup>.P and omit trailing **0** processes. By convention, we range over linear channels with x, y, z and shared channels with u, v, w.

$$\begin{array}{llll} \text{(\text{(out)})} & \text{(in)} & \text{(out)} & \text{(out)}\\ x\langle y\rangle.P \xrightleftharpoons{x(y)}.P & x\langle y\rangle.P \xrightleftharpoons{x(z)}.P\{z'/y\} & x\langle A\rangle.P \xrightleftharpoons{x(A)}.P & x\langle Y\rangle.P \xrightleftharpoons{x(B)}.P\{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(in)}}{\stackrel{\text{(}(\text{(in)})}{\stackrel{\text{(in)}}{\stackrel{\text{(}(\text{(in)})}{\stackrel{\text{(}(\text{(in)})}{\stackrel{\text{(}(\text{(in)})}{\stackrel{\text{(}(\text{(in)})}{\stackrel{\text{(}(\text{(in)})}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{(}(\text{)}}{\stackrel{\text{$$

**Fig. 1.** Labelled transition system.

The syntax of session types is that of (intuitionistic) linear logic propositions which are assigned to channels according to their usages in processes: **1** denotes the type of a channel along which no further behaviour occurs; A - B denotes a session that waits to receive a channel of type A and will then proceed as a session of type <sup>B</sup>; dually, <sup>A</sup> <sup>⊗</sup> <sup>B</sup> denotes a session that sends a channel of type A and continues as B; A & B denotes a session that offers a choice between proceeding as behaviours <sup>A</sup> or <sup>B</sup>; <sup>A</sup> <sup>⊕</sup> <sup>B</sup> denotes a session that internally chooses to continue as either A or B, signalling appropriately to the communicating partner; !A denotes a session offering an unbounded (but finite) number of behaviours of type <sup>A</sup>; <sup>∀</sup>X.A denotes a polymorphic session that receives a type <sup>B</sup> and behaves uniformly as <sup>A</sup>{B/X}; dually, <sup>∃</sup>X.A denotes an existentially typed session, which emits a type <sup>B</sup> and behaves as <sup>A</sup>{B/X}.

**Operational Semantics.** The operational semantics of our calculus is presented as a standard labelled transition system (Fig. 1) in the style of the *early* system for the π-calculus [46].

In the remainder of this work we write <sup>≡</sup> for a standard <sup>π</sup>-calculus structural congruence extended with the clause [<sup>x</sup> <sup>↔</sup> <sup>y</sup>] <sup>≡</sup> [<sup>y</sup> <sup>↔</sup> <sup>x</sup>]. In order to streamline the presentation of observational equivalence [7,36], we write ≡! for structural congruence extended with the so-called sharpened replication axioms [46], which capture basic equivalences of replicated processes (and are present in the proof dynamics of the exponential of linear logic). A transition <sup>P</sup> <sup>α</sup>−−→ <sup>Q</sup> denotes that P may evolve to Q by performing the action represented by label α. An action α (α) requires a matching α (α) in the environment to enable progress. Labels include: the silent internal action <sup>τ</sup> , output and bound output actions (x<sup>y</sup> and (νz)x<sup>z</sup>); input action <sup>x</sup>(y); the binary choice actions (x.inl, x.inl, x.inr, and x.inr); and output and input actions of types (x-<sup>A</sup> and <sup>x</sup>(A)).

The labelled transition relation is defined by the rules in Fig. 1, subject to the side conditions: in rule (res), we require <sup>y</sup> <sup>∈</sup> *fn*(α); in rule (par), we require *bn*(α)∩*fn*(R) = <sup>∅</sup>; in rule (close), we require <sup>y</sup> <sup>∈</sup> *fn*(Q). We omit the symmetric versions of (par), (com), (lout), (lin), (close) and closure under α-conversion. We write <sup>ρ</sup>1ρ<sup>2</sup> for the composition of relations <sup>ρ</sup>1, ρ2. We write −→ to stand for <sup>τ</sup> −→≡.

(-R) Ω; Γ; Δ, x:A P :: z:B Ω; Γ; Δ z(x).P :: z:A - <sup>B</sup> (⊗R) Ω; Γ; Δ<sup>1</sup> P :: y:A Ω; Γ; Δ<sup>2</sup> Q :: z:B Ω; Γ; Δ1, Δ<sup>2</sup> (νx)zy.(P | Q) :: z:A ⊗ B (∀R) Ω,X; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>z</sup>(X).P :: <sup>z</sup>:∀X.A (∀L) Ω B type Ω; Γ; Δ, x:A{B/X} P :: z:C Ω; Γ; Δ, x:∀X.A x-B.P :: z:C (∃R) Ω B type Ω; Γ; Δ P :: z:A{B/X} Ω; Γ; Δ z-<sup>B</sup>.P :: <sup>z</sup>:∃X.A (∃L) Ω,X; <sup>Γ</sup>; Δ, x:<sup>A</sup> <sup>P</sup> :: <sup>z</sup>:<sup>C</sup> Ω; Γ; Δ, x:∃X.A x(X).P :: z:C (id) <sup>Ω</sup>; <sup>Γ</sup>; <sup>x</sup>:<sup>A</sup> [<sup>x</sup> <sup>↔</sup> <sup>z</sup>] :: <sup>z</sup>:<sup>A</sup> (cut) Ω; Γ; Δ<sup>1</sup> P :: x:A Ω; Γ; Δ2, x:A Q :: z:C Ω; Γ; Δ1, Δ<sup>2</sup> (νx)(P | Q) :: z:C

$$\textbf{Fig. 2. Typing rules (abridged -- see [52] for all rules)...}$$

Weak transitions are defined as usual: we write =⇒ for the reflexive, transitive closure of <sup>τ</sup> −→ and −→<sup>+</sup> for the transitive closure of <sup>τ</sup> −→. Given <sup>α</sup> <sup>=</sup> <sup>τ</sup> , notation <sup>α</sup> =⇒ stands for =<sup>⇒</sup> <sup>α</sup> −→=<sup>⇒</sup> and <sup>τ</sup> <sup>=</sup><sup>⇒</sup> stands for =⇒.

**Typing System.** The typing rules of Polyπ are given in Fig. 2, following [7]. The rules define the judgment <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:A, denoting that process <sup>P</sup> offers a session of type A along channel z, using the *linear* sessions in Δ, (potentially) using the unrestricted or *shared* sessions in Γ, with polymorphic type variables maintained in <sup>Ω</sup>. We use a well-formedness judgment <sup>Ω</sup> <sup>A</sup> type which states that <sup>A</sup> is well-formed wrt the type variable environment <sup>Ω</sup> (i.e. *fv*(A) <sup>⊆</sup> <sup>Ω</sup>). We often write <sup>T</sup> for the right-hand side typing <sup>z</sup>:A, · for the empty context and Δ, Δ for the union of contexts Δ and Δ , only defined when Δ and Δ are disjoint. We write · <sup>P</sup> :: <sup>T</sup> for ·; ·; · <sup>P</sup> :: <sup>T</sup>.

As in [8,9,36,54], the typing discipline enforces that channel outputs always have as object a *fresh* name, in the style of the internal mobility π-calculus [44]. We clarify a few of the key rules: Rule ∀R defines the meaning of (impredicative) universal quantification over session types, stating that a session of type <sup>∀</sup>X.A inputs a type and then behaves uniformly as A; dually, to use such a session (rule <sup>∀</sup>L), a process must output a type <sup>B</sup> which then warrants the use of the session as type <sup>A</sup>{B/X}. Rule -R captures session input, where a session of type A - B expects to receive a session of type A which will then be used to produce a session of type <sup>B</sup>. Dually, session output (rule <sup>⊗</sup>R) is achieved by producing a fresh session of type A (that uses a disjoint set of sessions to those of the continuation) and outputting the fresh session along z, which is then a session of type B. Linear composition is captured by rule cut which enables a process that offers a session x:A (using linear sessions in Δ1) to be composed with a process that *uses* that session (amongst others in Δ2) to offer z:C. As shown in [7], typing entails Subject Reduction, Global Progress, and Termination.

**Observational Equivalences.** We briefly summarise the typed congruence and logical equivalence with polymorphism, giving rise to a suitable notion of relational parametricity in the sense of Reynolds [41], defined as a contextual logical relation on typed processes [7]. The logical relation is reminiscent of a typed bisimulation. However, extra care is needed to ensure well-foundedness due to impredicative type instantiation. As a consequence, the logical relation allows us to reason about process equivalences where type variables are not instantiated with *the same*, but rather *related* types.

**Typed Barbed Congruence (**∼=**).** We use the typed contextual congruence from [7], which preserves *observable* actions, called barbs. Formally, *barbed congruence*, noted ∼=, is the largest equivalence on well-typed processes that is τ closed, barb preserving, and contextually closed under typed contexts; see [7,52] for the full definition.

**Logical Equivalence (**≈L**).** The definition of logical equivalence is no more than a typed contextual bisimulation with the following intuitive reading: given two open processes P and Q (i.e. processes with non-empty left-hand side typings), we define their equivalence by inductively closing out the context, composing with equivalent processes offering appropriately typed sessions. When processes are closed, we have a single distinguished session channel along which we can perform observations, and proceed inductively on the structure of the offered session type. We can then show that such an equivalence satisfies the necessary fundamental properties (Theorem 2.3).

The logical relation is defined using the candidates technique of Girard [16]. In this setting, an *equivalence candidate* is a relation on typed processes satisfying basic closure conditions: an equivalence candidate must be compatible with barbed congruence and closed under forward and converse reduction.

**Definition 2.1 (Equivalence Candidate).** An *equivalence candidate* R at <sup>z</sup>:<sup>A</sup> and <sup>z</sup>:B, noted <sup>R</sup> :: <sup>z</sup>:<sup>A</sup> <sup>⇔</sup> <sup>B</sup>, is a binary relation on processes such that, for every (P, Q) ∈ R :: <sup>z</sup>:<sup>A</sup> <sup>⇔</sup> <sup>B</sup> both · <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> and · <sup>Q</sup> :: <sup>z</sup>:<sup>B</sup> hold, together with the following (we often write (P, Q) ∈ R :: <sup>z</sup>:<sup>A</sup> <sup>⇔</sup> <sup>B</sup> as <sup>P</sup> <sup>R</sup> <sup>Q</sup> :: <sup>z</sup>:<sup>A</sup> <sup>⇔</sup> <sup>B</sup>):


To define the logical relation we rely on some auxiliary notation, pertaining to the treatment of type variables arising due to impredicative polymorphism. We write ω : Ω to denote a mapping ω that assigns a closed type to the type variables in Ω. We write ω(X) for the type mapped by ω to variable X. Given two mappings ω : Ω and ω : Ω, we define an equivalence candidate assignment η between <sup>ω</sup> and <sup>ω</sup> as a mapping of equivalence candidate <sup>η</sup>(X) :: <sup>−</sup>:ω(X)<sup>⇔</sup> <sup>ω</sup> (X) to the type variables in Ω, where the particular choice of a distinguished righthand side channel is *delayed* (i.e. to be instantiated later on). We write η(X)(z) for the instantiation of the (delayed) candidate with the name z. We write η : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> to denote that <sup>η</sup> is a candidate assignment between <sup>ω</sup> and <sup>ω</sup> ; and ˆω(P) to denote the application of mapping ω to P.

We define a sequent-indexed family of process relations, that is, a set of pairs of processes (P, Q), written <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>T</sup>[<sup>η</sup> : <sup>ω</sup> <sup>⇔</sup>ω ], satisfying some conditions, typed under <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>T</sup>, with <sup>ω</sup> : <sup>Ω</sup>, <sup>ω</sup> : <sup>Ω</sup> and <sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> . Logical equivalence is defined inductively on the size of the typing contexts and then on the structure of the right-hand side type. We show only select cases (see [52] for the full definition).

**Definition 2.2 (Logical Equivalence). (Base Case)** Given a type A and mappings ω, ω , η, we define *logical equivalence*, noted <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>z</sup>:A[<sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> ], as the smallest symmetric binary relation containing all pairs of processes (P, Q) such that (i) · <sup>ω</sup>ˆ(P) :: <sup>z</sup>:ˆω(A); (ii) · <sup>ω</sup>ˆ (Q) :: z:ˆω (A); and (iii) satisfies the conditions given below:


**(Inductive Case).** Let Γ,Δ be non empty. Given <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>T</sup> and <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>Q</sup> :: <sup>T</sup>, the binary relation on processes <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>T</sup>[<sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> ] (with ω, ω : <sup>Ω</sup> and <sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> ) is inductively defined as:

$$\begin{array}{c} \Gamma; \Delta, y: A \vdash P \approx\_{\mathsf{L}} Q :: T[\eta: \omega \Leftrightarrow \omega'] \quad \text{iff} \quad \forall R\_{1}, R\_{2}. \text{ s.t. } R\_{1} \approx\_{\mathsf{L}} R\_{2} :: y . \mathsf{A}[\eta: \omega \Leftrightarrow \omega'],\\ \Gamma; \Delta \vdash (\nu y)(\hat{\omega}(P) \mid \hat{\omega}(R\_{1})) \approx\_{\mathsf{L}} (\nu y)(\hat{\omega}'(Q) \mid \hat{\omega}'(R\_{2})) :: T[\eta: \omega \Leftrightarrow \omega'],\\ \Gamma, u: A; \Delta \vdash P \approx\_{\mathsf{L}} Q :: T[\eta: \omega \Leftrightarrow \omega'] \quad \text{iff} \quad \forall R\_{1}, R\_{2}. \text{ s.t. } R\_{1} \approx\_{\mathsf{L}} R\_{2} :: y . \mathsf{A}[\eta: \omega \Leftrightarrow \omega'],\\ \Gamma; \Delta \vdash (\nu u)(\hat{\omega}(P) \mid !u(y). \hat{\omega}(R\_{1})) \approx\_{\mathsf{L}} (\nu u)(\hat{\omega}'(Q) \mid !u(y). \hat{\omega}'(R\_{2})) :: T[\eta: \omega \Leftrightarrow \omega'] \end{array}$$

For the sake of readability we often omit the <sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> portion of <sup>≈</sup>L, which is henceforth implicitly universally quantified. Thus, we write <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>z</sup>:<sup>A</sup> (or <sup>P</sup> <sup>≈</sup>L <sup>Q</sup>) iff the two given processes are logically equivalent for all consistent instantiations of its type variables.

It is instructive to inspect the clause for type input (∀X.A): the two processes must be able to match inputs of any pair of *related* types (i.e. types related by a candidate), such that the continuations are related at the open type A with the appropriate type variable instantiations, following Girard [16]. The power of this style of logical relation arises from a combination of the extensional flavour of the equivalence and the fact that polymorphic equivalences do not require the same type to be instantiated in both processes, but rather that the types are *related* (via a suitable equivalence candidate relation).

### **Theorem 2.3 (Properties of Logical Equivalence** [7])

**Parametricity:** *If* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then, for all* ω, ω : <sup>Ω</sup> *and* <sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> *, we have* <sup>Γ</sup>; <sup>Δ</sup> <sup>ω</sup>ˆ(P) <sup>≈</sup>L <sup>ω</sup>ˆ (P) :: <sup>z</sup>:A[<sup>η</sup> : <sup>ω</sup> <sup>⇔</sup> <sup>ω</sup> ]*.* **Soundness:** *If* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>z</sup>:<sup>A</sup> *then* <sup>C</sup>[P] <sup>∼</sup><sup>=</sup> <sup>C</sup>[Q] :: <sup>z</sup>:A*, for any closing* C[−]*.* **Completeness:** *If* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>∼</sup><sup>=</sup> <sup>Q</sup> :: <sup>z</sup>:<sup>A</sup> *then* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>z</sup>:A*.*

### **3 To Linear-F and Back**

We now develop our mutually inverse and fully abstract encodings between Polyπ and a linear polymorphic λ-calculus [55] that we dub Linear-F. We first introduce the syntax and typing of the linear λ-calculus and then proceed to detail our encodings and their properties (we omit typing ascriptions from the existential polymorphism constructs for readability).

**Definition 3.1 (Linear-F).** The syntax of terms M,N and types A, B of Linear-F is given below.

M,N :: = λx:A.M <sup>|</sup> M N | <sup>M</sup> <sup>⊗</sup> <sup>N</sup> | let <sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>=</sup> <sup>M</sup> in <sup>N</sup> <sup>|</sup> !<sup>M</sup> <sup>|</sup> let!<sup>u</sup> <sup>=</sup> <sup>M</sup> in <sup>N</sup> <sup>|</sup> ΛX.M <sup>|</sup> <sup>M</sup>[A] <sup>|</sup> pack <sup>A</sup> with <sup>M</sup> <sup>|</sup> let (X, y) = <sup>M</sup> in <sup>N</sup> <sup>|</sup> let **<sup>1</sup>** <sup>=</sup> <sup>M</sup> in <sup>N</sup> | | <sup>T</sup> <sup>|</sup> <sup>F</sup> A, B :: = A -<sup>B</sup> <sup>|</sup> <sup>A</sup> <sup>⊗</sup> <sup>B</sup> <sup>|</sup> !<sup>A</sup> | ∀X.A | ∃X.A <sup>|</sup> <sup>X</sup> <sup>|</sup> **<sup>1</sup>** <sup>|</sup> **<sup>2</sup>**

The syntax of types is that of the multiplicative and exponential fragments of second-order intuitionistic linear logic: λx:A.M denotes linear λ-abstractions; M N denotes the application; -<sup>M</sup> <sup>⊗</sup> <sup>N</sup> denotes the multiplicative pairing of <sup>M</sup> and <sup>N</sup>, as reflected in its elimination form let <sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>=</sup> <sup>M</sup> in <sup>N</sup> which simultaneously deconstructs the pair M, binding its first and second projection to x and y in N, respectively; !M denotes a term M that does not use any linear variables and so may be used an arbitrary number of times; let!u = M in N binds the underlying exponential term of M as u in N; ΛX.M is the type abstraction former; M[A] stands for type application; pack A with M is the existential type introduction form, where M is a term where the existentially typed variable is instantiated with A; let(X, y) = M in N unpacks an existential package M, binding the representation type to X and the underlying term to y in N; the multiplicative unit **1** has as introduction form the nullary pair and is eliminated by the construct let **1** = M in N, where M is a term of type **1**. Booleans (type **2** with values T and F) are the basic observable.

The typing judgment in Linear-F is given as <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>M</sup> : <sup>A</sup>, following the DILL formulation of linear logic [3], stating that term M has type A in a linear context Δ (i.e. bindings for linear variables x:B), intuitionistic context Γ (i.e. binding for intuitionistic variables u:B) and type variable context Ω. The typing rules are standard [7]. The operational semantics of the calculus are the expected call-by-name semantics with commuting conversions [27]. We write ⇓ for the evaluation relation. We write ∼= for the largest typed congruence that is consistent with the observables of type **2** (i.e. a so-called Morris-style equivalence as in [5]).

### **3.1 Encoding Linear-F into Session** *π***-Calculus**

We define a translation from Linear-F to Polyπ generalising the one from [49], accounting for polymorphism and multiplicative pairs. We translate typing derivations of λ-terms to those of π-calculus terms (we omit the full typing derivation for the sake of readability).

Proof theoretically, the λ-calculus corresponds to a proof term assignment for natural deduction presentations of logic, whereas the session π-calculus from *§* **2** corresponds to a proof term assignment for sequent calculus. Thus, we obtain a translation from λ-calculus to the session π-calculus by considering the proof theoretic content of the constructive proof of soundness of the sequent calculus wrt natural deduction. Following Gentzen [14], the translation from natural deduction to sequent calculus maps introduction rules to the corresponding right rules and elimination rules to a combination of the corresponding left rule, cut and/or identity.

Since typing in the session calculus identifies a distinguished channel along which a process offers a session, the translation of λ-terms is parameterised by a "result" channel along which the behaviour of the λ-term is implemented. Given a λ-term M, the process -M<sup>z</sup> encodes the behaviour of M along the session channel z. We enforce that the type **2** of booleans and its two constructors are consistently translated to their polymorphic Church encodings before applying the translation to Polyπ. Thus, type **<sup>2</sup>** is first translated to <sup>∀</sup>X.!X-!X-X, the value T to ΛX.λu:!X.λv:!X.let!x = u in let!y = v in x and the value F to ΛX.λu:!X.λv:!X.let!x = u in let!y = v in y. Such representations of the booleans are adequate up to parametricity [6] and suitable for our purposes of relating the session calculus (which has no primitive notion of value or result type) with the λ-calculus precisely due to the tight correspondence between the two calculi.

**Definition 3.2 (From Linear-F to Poly**π**).** -Ω; -Γ; -<sup>Δ</sup> -M<sup>z</sup> :: z:A denotes the translation of contexts, types and terms from Linear-F to the polymorphic session calculus. The translations on contexts and types are the identity function. Booleans and their values are first translated to their Church encodings as specified above. The translation on λ-terms is given below:


To translate a (linear) λ-abstraction λx:A.M, which corresponds to the proof term for the introduction rule for -, we map it to the corresponding -R rule, thus obtaining a process z(x).-M<sup>z</sup> that inputs along the result channel z a channel x which will be used in -M<sup>z</sup> to access the function argument. To encode the application M N, we compose (i.e. cut) -Mx, where x is a fresh name, with a process that provides the (encoded) function argument by outputting along x a channel y which offers the behaviour of -Ny. After the output is performed, the type of x is now that of the function's codomain and thus we conclude by forwarding (i.e. the id rule) between x and the result channel z.

The encoding for polymorphism follows a similar pattern: To encode the abstraction ΛX.M, we receive along the result channel a type that is bound to X and proceed inductively. To encode type application M[A] we encode the abstraction M in parallel with a process that sends A to it, and forwards accordingly. Finally, the encoding of the existential package pack A with M maps to an output of the type A followed by the behaviour -Mz, with the encoding of the elimination form let(X, y) = M in N composing the translation of the term of existential type M with a process performing the appropriate type input and proceeding as -Nz.

*Example 3.3 (Encoding of Linear-F).* Consider the following λ-term corresponding to a polymorphic pairing function (recall that we write <sup>z</sup><sup>w</sup>.P for (νw)z<sup>w</sup>.P):

$$M \triangleq \Lambda X.AY.\lambda x \colon X.\lambda y \colon Y.\langle x \otimes y\rangle \text{ and } N \triangleq \left( \left( M[A][B] \ M\_1 \right) M\_2 \right)$$

Then we have, with ˜x = x1x2x3x4:

$$\begin{array}{lcl} \left[ \left[ N \right]\_z \equiv (\nu \bar{x}) (\left[ \left[ M \right]\_x \mid x\_1 \langle A \rangle. [x\_1 \leftrightarrow x\_2] \mid x\_2 \langle B \rangle. [x\_2 \leftrightarrow x\_3] \right] \\ \qquad \overline{x\_3} \langle x \rangle. (\left[ \left[ M \right]\_x \mid [x\_3 \leftrightarrow x\_4] \right] \mid \overline{x\_4} \langle y \rangle. (\left[ M \}\_y \mid [x\_4 \leftrightarrow z] \rangle) \\ \equiv (\nu \bar{x}) (x\_1 \langle X \rangle. x\_1 \langle Y \rangle. x\_1 \langle x \rangle. x\_1 \langle y \rangle. \overline{x\_1} \langle w \rangle. (\left[ x \leftrightarrow w \right] \mid [y \leftrightarrow x\_1] \rangle \mid x\_1 \langle A \rangle. [x\_1 \leftrightarrow x\_2] \mid [y \leftrightarrow x\_3] \rangle \\ \qquad x\_2 \langle B \rangle. [x\_2 \leftrightarrow x\_3] \mid \overline{x\_3} \langle x \rangle. (\left[ M \}\_x \mid [x\_3 \leftrightarrow x\_4] \rangle \mid \overline{x\_4} \langle y \rangle. (\left[ M \}\_y \mid [x\_4 \leftrightarrow z] \right]) \end{array} \right.$$

We can observe that <sup>N</sup> −→<sup>+</sup> (((λx:A.λy:B.<sup>x</sup> <sup>⊗</sup> <sup>y</sup>) <sup>M</sup>1) <sup>M</sup>2) −→<sup>+</sup> -<sup>M</sup><sup>1</sup> <sup>⊗</sup> <sup>M</sup>2. At the process level, each reduction corresponding to the redex of type application is simulated by two reductions, obtaining:

$$\begin{array}{c} \|N\|\_{z} \to^{+} (\nu x\_{3}, x\_{4}) (x\_{3} \langle x \rangle. x\_{3} \langle y \rangle. \overline{x\_{3}} \langle w \rangle. ([x \leftrightarrow w] \mid [y \leftrightarrow x\_{3}]) \mid \\ \qquad \overline{x\_{3}} \langle x \rangle. (\lbrack M\_{1} \rrbracket\_{x} \mid [x\_{3} \leftrightarrow x\_{4}]) \mid \overline{x\_{4}} \langle y \rangle. (\lbrack M\_{2} \rrbracket\_{y} \mid [x\_{4} \leftrightarrow z])) = P \end{array}$$

The reductions corresponding to the β-redexes clarify the way in which the encoding represents substitution of terms for variables via fine-grained name passing. Consider --<sup>M</sup><sup>1</sup> <sup>⊗</sup> <sup>M</sup>2<sup>z</sup> <sup>z</sup><sup>w</sup>.(-<sup>M</sup>1<sup>w</sup> <sup>|</sup> -M2z) and

$$P \rightharpoonup^+ (\nu x, y) (\|M\_1\|\_x \mid \|M\_2\|\_y \mid \overline{z} \langle w \rangle. ([x \rightsquigarrow w] \mid [y \rightsquigarrow z]))$$

The encoding of the pairing of M<sup>1</sup> and M<sup>2</sup> outputs a fresh name w which will denote the behaviour of (the encoding of) M1, and then the behaviour of the encoding of M<sup>2</sup> is offered on z. The reduct of P outputs a fresh name w which is then identified with x and thus denotes the behaviour of -M1w. The channel z is identified with y and thus denotes the behaviour of -M2z, making the two processes listed above equivalent. This informal reasoning exposes the insights that justify the operational correspondence of the encoding. Proof-theoretically, these equivalences simply map to commuting conversions which push the processes -M1<sup>x</sup> and -M2<sup>z</sup> under the output on z.

#### **Theorem 3.4 (Operational Correspondence)**

$$\begin{array}{l} \text{- } \text{If } \Omega; \Gamma; \Delta \vdash M : A \text{ and } M \to N \text{ then } \lbrack M \rbrack\_{z} \Longrightarrow P \text{ such that } \lbrack N \rbrack\_{z} \approx\_{\mathsf{L}} P \\\text{- } \text{If } \lbrack M \rbrack\_{z} \to P \text{ then } M \to^{+} N \text{ and } \lbrack N \rbrack\_{z} \approx\_{\mathsf{L}} P \end{array}$$

#### **3.2 Encoding Session** *π***-calculus to Linear-F**

Just as the proof theoretic content of the soundness of sequent calculus wrt natural deduction induces a translation from λ-terms to session-typed processes, the *completeness* of the sequent calculus wrt natural deduction induces a translation from the session calculus to the λ-calculus. This mapping identifies sequent calculus right rules with the introduction rules of natural deduction and left rules with elimination rules combined with (type-preserving) substitution. Crucially, the mapping is defined on *typing derivations*, enabling us to consistently identify when a process uses a session (i.e. left rules) or, dually, when a process offers a session (i.e. right rules).

$$\begin{cases} \left(\multimap R\right) \left(\multimap R\right) \frac{\Delta, x:A \vdash P :: z:B}{\Delta \vdash z(x).P :: z:A \multimap B} \bigg| \frac{\Delta, x:A \vdash \left(\multimap P\right)\_{\Delta, x:A \vdash z:B} : B}{\Delta \vdash \lambda x:A. \left(\left.\kernv P\right)\_{\Delta, x:A \vdash z:B} : A \multimap B} \right) \\\\ \left(\begin{smallmatrix} \multimap \mathbf{L} \\ \Delta \vdash P :: z:A \multimap B\_{\Delta}, x:B \vdash Q :: z:C \\ \hline \Delta\_{1}, \Delta\_{2}, x:A \multimap B \vdash (\nu y) x \big(y\right).(P \mid Q) :: z:C \end{smallmatrix} \right) \overset{\Delta}{=} \\\\ \left(\begin{smallmatrix} \multimap E \\ \Delta\_{1}, x:B \vdash \left(\left.\kernv P\right|\_{\Delta\_{1}, x:A \vdash z:C} \right) \end{smallmatrix} \begin{smallmatrix} \multimap E \\ x:A \multimap B \vdash x:A \multimap B & \Delta\_{1} \vdash \left\{\left.\kernv P\right|\_{\Delta\_{1} \vdash y:A \vdash B} \right\} \end{smallmatrix} \end{cases}$$

**Fig. 3.** Translation on typing derivations (excerpt – see [52])

**Definition 3.5 (From Poly**<sup>π</sup> **to Linear-F).** We write Ω;Γ;Δ P : <sup>A</sup> for the translation from typing derivations in Polyπ to derivations in Linear-F. The translations on types and contexts are the identity function. The translation on processes is given below, where the leftmost column indicates the typing rule at the root of the derivation (see Fig. 3 for an excerpt of the translation on typing derivations, where we write P<sup>Ω</sup>;Γ;Δz:<sup>A</sup> to denote the translation of <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:A. We omit <sup>Ω</sup> and <sup>Γ</sup> when unchanged).


For instance, the encoding of a process z(x).P :: z:A - B, typed by rule -R, results in the corresponding - I introduction rule in the λ-calculus and thus is λx:A.P. To encode the process (νy)x<sup>y</sup>.(<sup>P</sup> <sup>|</sup> <sup>Q</sup>), typed by rule -L, we make use of substitution: Given that the sub-process Q is typed as Ω; Γ; Δ , x:<sup>B</sup> <sup>Q</sup> :: <sup>z</sup>:C, the encoding of the full process is given by Q{(<sup>x</sup> P)/x}. The term x P consists of the application of x (of function type) to the argument P, thus ensuring that the term resulting from the substitution is of the appropriate type. We note that, for instance, the encoding of rule ⊗L does not need to appeal to substitution – the λ-calculus let style rules can be mapped directly. Similarly, rule ∀R is mapped to type abstraction, whereas rule ∀L which types a process of the form <sup>x</sup>-<sup>B</sup>.P maps to a substitution of the type application x[B] for x in P. The encoding of existential polymorphism is simpler due to the let-style elimination. We also highlight the encoding of the cut rule which embodies parallel composition of two processes sharing a linear name, which clarifies the use/offer duality of the intuitionistic calculus – the process that offers P is encoded and substituted into the encoded user Q.

**Theorem 3.6.** *If* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then* Ω;Γ;Δ P : <sup>A</sup>*.*

*Example 3.7 (Encoding of Poly*π*).* Consider the following processes

$$P \triangleq z(X).z(Y).z(x).z(y).\overline{z}\langle w \rangle.\langle [x \leftrightarrow w] \mid [y \leftrightarrow z] \rangle \quad Q \triangleq z\langle \mathbf{1} \rangle.z\langle \mathbf{1} \rangle.\overline{z}\langle x \rangle.\overline{z}\langle y \rangle.z(w).[w \leftrightarrow r] \quad$$

with <sup>P</sup> :: <sup>z</sup>:∀X.∀Y.X - Y - <sup>X</sup> <sup>⊗</sup> <sup>Y</sup> and <sup>z</sup>:∀X.∀Y.X - Y - <sup>X</sup> <sup>⊗</sup> <sup>Y</sup> <sup>Q</sup> :: <sup>r</sup>:**1**. Then: P <sup>=</sup> ΛX.ΛY.λx:X.λy:Y.<sup>x</sup> <sup>⊗</sup> <sup>y</sup> Q <sup>=</sup> let <sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>=</sup> <sup>z</sup>[**1**][**1**] in let **<sup>1</sup>** <sup>=</sup> <sup>y</sup> in <sup>x</sup> (νz)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>) <sup>=</sup> let <sup>x</sup> <sup>⊗</sup> <sup>y</sup> = (ΛX.ΛY.λx:X.λy:Y.<sup>x</sup> <sup>⊗</sup> <sup>y</sup>)[**1**][**1**] in let **<sup>1</sup>** <sup>=</sup> <sup>y</sup> in <sup>x</sup>

By the behaviour of (νz)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>), which consists of a sequence of cuts, and its encoding, we have that (νz)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>) −→<sup>+</sup> and (νz)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>) −→<sup>+</sup> **<sup>0</sup>** <sup>=</sup> -.

In general, the translation of Definition 3.5 can introduce some distance between the immediate operational behaviour of a process and its corresponding λ-term, insofar as the translations of cuts (and left rules to non let-form elimination rules) make use of substitutions that can take place deep within the resulting term. Consider the process at the root of the following typing judgment <sup>Δ</sup>1, Δ2, Δ<sup>3</sup> (νx)(x(y).P<sup>1</sup> <sup>|</sup> (νy)x<sup>y</sup>.(P<sup>2</sup> <sup>|</sup> <sup>w</sup>(z).**0**)) :: <sup>w</sup>:**<sup>1</sup>** - **1**, derivable through a cut on session x between instances of -R and -L, where the continuation process w(z).**0** offers a session w:**1** - **1** (and so must use rule **1**L on x). We have that: (νx)(x(y).P<sup>1</sup> <sup>|</sup> (νy)x<sup>y</sup>.(P<sup>2</sup> <sup>|</sup> <sup>w</sup>(z).**0**)) −→ (νx, y)(P<sup>1</sup> <sup>|</sup> <sup>P</sup><sup>2</sup> <sup>|</sup> <sup>w</sup>(z).**0**). However, the translation of the process above results in the term λz:**1**.let **1** = ((λy:A.P1)P2)in let **<sup>1</sup>** <sup>=</sup> <sup>z</sup> in -, where the redex that corresponds to the process reduction is present but hidden under the binder for z (corresponding to the input along w). Thus, to establish operational completeness we consider full <sup>β</sup>-reduction, denoted by −→<sup>β</sup>, i.e. enabling <sup>β</sup>-reductions under binders.

**Theorem 3.8 (Operational Completeness).** *Let* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:A*. If* <sup>P</sup> −→ <sup>Q</sup> *then* P −→<sup>∗</sup> <sup>β</sup> Q*.*

In order to study the soundness direction it is instructive to consider typed process x:**1** - **<sup>1</sup>** <sup>x</sup><sup>y</sup>.(νz)(z(w).**<sup>0</sup>** <sup>|</sup> <sup>z</sup><sup>w</sup>.**0**) :: <sup>v</sup>:**<sup>1</sup>** and its translation:

$$\begin{array}{l} \{\overline{x}\langle y\rangle. (\nu z)(z\langle w\rangle. \mathbf{0} \mid \overline{z}\langle w\rangle. \mathbf{0})\} = \{ (\nu z)(z\langle w\rangle. \mathbf{0} \mid \overline{z}\langle w\rangle. \mathbf{0})\}\{ (x\langle\rangle)/x\} \\ = \mathtt{let}\,\mathbf{1} = (\lambda w \mathbf{:1}. \mathtt{let}\,\mathbf{1} = w \,\mathbf{in}\,\langle\rangle)\langle\rangle \,\text{in}\,\mathtt{let}\,\mathbf{1} = x\langle\rangle \,\text{in}\,\langle\rangle \end{array}$$

The process above cannot reduce due to the output prefix on x, which cannot synchronise with a corresponding input action since there is no provider for x (i.e. the channel is in the left-hand side context). However, its encoding can exhibit the β-redex corresponding to the synchronisation along z, hidden by the prefix on x. The corresponding reductions hidden under prefixes in the encoding can be *soundly* exposed in the session calculus by appealing to the commuting conversions of linear logic (e.g. in the process above, the instance of rule -L corresponding to the output on x can be commuted with the cut on z).

As shown in [36], commuting conversions are sound wrt observational equivalence, and thus we formulate operational soundness through a notion of *extended* process reduction, which extends process reduction with the reductions that are induced by commuting conversions. Such a relation was also used for similar purposes in [5] and in [26], in a classical linear logic setting. For conciseness, we define extended reduction as a relation on *typed* processes modulo ≡.

**Definition 3.9 (Extended Reduction** [5]**).** We define → as the type preserving relations on typed processes modulo ≡ generated by:

1. <sup>C</sup>[(νy)x<sup>y</sup>.P] <sup>|</sup> <sup>x</sup>(y).Q → C[(νy)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>)]; 2. <sup>C</sup>[(νy)x<sup>y</sup>.P] <sup>|</sup> !x(y).Q → C[(νy)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>)] <sup>|</sup> !x(y).Q; and 3. (νx)(!x(y).Q) → **<sup>0</sup>**

where <sup>C</sup> is a (typed) process context which does not capture the bound name <sup>y</sup>.

**Theorem 3.10 (Operational Soundness).** *Let* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* P −→ <sup>M</sup>*, there exists* <sup>Q</sup> *such that* <sup>P</sup> →<sup>∗</sup> <sup>Q</sup> *and* Q <sup>=</sup><sup>α</sup> <sup>M</sup>*.*

#### **3.3 Inversion and Full Abstraction**

Having established the operational preciseness of the encodings to-and-from Polyπ and Linear-F, we establish our main results for the encodings. Specifically, we show that the encodings are mutually inverse up-to behavioural equivalence (with *fullness* as its corollary), which then enables us to establish *full abstraction* for *both* encodings.

**Theorem 3.11 (Inverse).** *If* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>M</sup> : <sup>A</sup> *then* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> -Mz ∼= M : <sup>A</sup>*. Also, if* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> -P<sup>z</sup> <sup>≈</sup>L <sup>P</sup> :: <sup>z</sup>:A*.*

**Corollary 3.12 (Fullness).** *Let* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:A*.* <sup>∃</sup><sup>M</sup> *s.t.* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>M</sup> : <sup>A</sup> *and* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> -<sup>M</sup><sup>z</sup> <sup>≈</sup>L <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *Also, let* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>M</sup> : <sup>A</sup>*.* <sup>∃</sup><sup>P</sup> *s.t.* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> P <sup>∼</sup><sup>=</sup> <sup>M</sup> : <sup>A</sup>*.*

We now state our full abstraction results. Given two Linear-F terms of the same type, equivalence in the image of the -−<sup>z</sup> translation can be used as a proof technique for contextual equivalence in Linear-F. This is called the *soundness* direction of full abstraction in the literature [18] and proved by showing the relation generated by -<sup>M</sup><sup>z</sup> <sup>≈</sup>L -N<sup>z</sup> forms ∼=; we then establish the *completeness* direction by contradiction, using fullness.

**Theorem 3.13 (Full Abstraction).** <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>M</sup> <sup>∼</sup><sup>=</sup> <sup>N</sup> : <sup>A</sup> *iff* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> -<sup>M</sup><sup>z</sup> <sup>≈</sup>L -N<sup>z</sup> :: z:A*.*

We can straightforwardly combine the above full abstraction with Theorem 3.11 to obtain full abstraction of the − translation.

**Theorem 3.14 (Full Abstraction).** <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>≈</sup>L <sup>Q</sup> :: <sup>z</sup>:<sup>A</sup> *iff* <sup>Ω</sup>; <sup>Γ</sup>; <sup>Δ</sup> P ∼= Q : A*.*

### **4 Applications of the Encodings**

In this section we develop applications of the encodings of the previous sections. Taking advantage of full abstraction and mutual inversion, we apply non-trivial properties from the theory of the λ-calculus to our session-typed process setting.

In *§* **4.1** we study inductive and coinductive sessions, arising through encodings of initial F-algebras and final F-coalgebras in the polymorphic λ-calculus.

In *§* **4.2** we study encodings for an extension of the core session calculus with term passing, where terms are derived from a simply-typed λ-calculus. Using the development of *§* **4.2** as a stepping stone, we generalise the encodings to a *higher-order* session calculus (*§* **4.3**), where processes can send, receive and execute other processes. We show full abstraction and mutual inversion theorems for the encodings from higher-order to first-order. As a consequence, we can straightforwardly derive a strong normalisation property for the higher-order process-passing calculus.

### **4.1 Inductive and Coinductive Session Types**

The study of polymorphism in the λ-calculus [1,6,19,40] has shown that parametric polymorphism is expressive enough to encode both inductive and coinductive types in a precise way, through a faithful representation of initial and final (co)algebras [28], without extending the language of terms nor the semantics of the calculus, giving a logical justification to the Church encodings of inductive datatypes such as lists and natural numbers. The polymorphic session calculus can express fairly intricate communication behaviours, including generic protocols through both existential and universal polymorphism (i.e. protocols that are parametric in their sub-protocols). Using our fully abstract encodings between the two calculi, we show that session polymorphism is expressive enough to encode inductive and coinductive sessions, "importing" the results for the λcalculus, which may then be instantiated to provide a session-typed formulation of the encodings of data structures in the π-calculus of [30].

**Inductive and Coinductive Types in System F.** Exploring an algebraic interpretation of polymorphism where types are interpreted as functors, it can be shown that given a type F with a free variable X that occurs only positively (i.e. occurrences of X are on the left-hand side of an even number of function arrows), the polymorphic type <sup>∀</sup>X.((F(X) <sup>→</sup> <sup>X</sup>) <sup>→</sup> <sup>X</sup>) forms an initial <sup>F</sup>-algebra [1,42] (we write F(X) to denote that X occurs in F). This enables the representation of *inductively* defined structures using an algebraic or categorical justification. For instance, the natural numbers can be seen as the initial F-algebra of F(X) = **1**+ X (where **1** is the unit type and + is the coproduct), and are thus *already present* in System F, in a precise sense, as the type <sup>∀</sup>X.((**<sup>1</sup>** <sup>+</sup> <sup>X</sup>) <sup>→</sup> <sup>X</sup>) <sup>→</sup> <sup>X</sup> (noting that both **1** and + can also be encoded in System F). A similar story can be told for *coinductively* defined structures, which correspond to final F-coalgebras and are representable with the polymorphic type <sup>∃</sup>X.(<sup>X</sup> <sup>→</sup> <sup>F</sup>(X)) <sup>×</sup> <sup>X</sup>, where × is a product type. In the remainder of this section we assume the positivity requirement on F mentioned above.

While the complete formal development of the representation of inductive and coinductive types in System F would lead us to far astray, we summarise here the key concepts as they apply to the λ-calculus (the interested reader can refer to [19] for the full categorical details).

$$\begin{array}{c|c|c} F(T\_i) & \xrightarrow{F(\mathsf{fold}[A](f))} & F(A) \\ & & & \\ \mathsf{in} & & & \\ T\_i & \xrightarrow{\mathsf{fold}[A](f)} & A & & \\ & & & \mathsf{(a)} & & \\ & & & & & F(A) & \xrightarrow{F(\mathsf{unfold}[A](f))} & F(T\_f) \\ \end{array}$$

**Fig. 4.** Diagrams for initial F-algebras and final F-coalgebras

To show that the polymorphic type <sup>T</sup><sup>i</sup> <sup>∀</sup>X.((F(X) <sup>→</sup> <sup>X</sup>) <sup>→</sup> <sup>X</sup>) is an initial F-algebra, one exhibits a pair of λ-terms, often dubbed fold and in, such that the diagram in Fig. 4(a) commutes (for any A, where F(f), where f is a λ-term, denotes the functorial action of F applied to f), and, crucially, that fold is *unique*. When these conditions hold, we are justified in saying that T<sup>i</sup> is a least fixed point of F. Through a fairly simple calculation, it is easy to see that:

$$\begin{array}{l} \mathsf{fold} \triangleq AX.\lambda x \colon F(X) \to X.\lambda t \colon T\_i.t[X](x) \\ \mathsf{in} \triangleq \lambda x \colon F(T\_i).AX.\lambda y \colon F(X) \to X.y \left( F(\mathsf{fold}[X](x))(x) \right) \end{array}$$

satisfy the necessary equalities. To show uniqueness one appeals to *parametricity*, which allows us to prove that any function of the appropriate type is equivalent to fold. This property is often dubbed initiality or universality.

The construction of final F-coalgebras and their justification as *greatest* fixed points is dual. Assuming products in the calculus and taking <sup>T</sup><sup>f</sup> <sup>∃</sup>X.(<sup>X</sup> <sup>→</sup> <sup>F</sup>(X)) <sup>×</sup> <sup>X</sup>, we produce the <sup>λ</sup>-terms

$$\begin{array}{c} \mathsf{unfold} \triangleq AX.\lambda f \colon X \to F(X).\lambda x \mathrel{\mathop{:}} T\_f.\mathsf{pack}\, X \,\, \mathsf{with}\,\, (f, x) \\ \mathsf{out} \triangleq \lambda t \mathrel{\mathop{:}} \lambda t \mathrel{\mathop{:}} T\_f.\mathsf{let}\,(X, (f, x)) = t \,\, \mathsf{in}\, F(\mathsf{unfold}[X](f))\,\, (f(x)) \end{array}$$

such that the diagram in Fig. 4(b) commutes and unfold is unique (again, up to parametricity). While the argument above applies to System F, a similar development can be made in Linear-F [6] by considering <sup>T</sup><sup>i</sup> <sup>∀</sup>X.!(F(X) - X) - X and <sup>T</sup><sup>f</sup> <sup>∃</sup>X.!(<sup>X</sup> - <sup>F</sup>(X)) <sup>⊗</sup> <sup>X</sup>. Reusing the same names for the sake of conciseness, the associated *linear* λ-terms are:

fold ΛX.λu:!(F(X) - <sup>X</sup>).λy:Ti.(y[X] <sup>u</sup>) : <sup>∀</sup>X.!(F(X) - X) - T<sup>i</sup> - X in λx:F(Ti).ΛX.λy:!(F(X) - X).let!u = y in k (F (fold[X](!u))(x)) : F(Ti) - T<sup>i</sup> unfold ΛX.λu:!(X - <sup>F</sup>(X)).λx:X.pack <sup>X</sup> with <sup>u</sup> <sup>⊗</sup> <sup>x</sup> : <sup>∀</sup>X.!(<sup>X</sup> - F(X)) - X - T<sup>f</sup> out λt : T<sup>f</sup> .let (X, (u, x)) = tin let!f = u in F(unfold[X](!f)) (f(x)) : T<sup>f</sup> -F(T<sup>f</sup> )

**Inductive and Coinductive Sessions for Free.** As a consequence of full abstraction we may appeal to the -−<sup>z</sup> encoding to derive representations of fold and unfold that satisfy the necessary algebraic properties. The derived processes are (recall that we write <sup>x</sup><sup>y</sup>.P for (νy)x<sup>y</sup>.P):

fold<sup>z</sup> <sup>z</sup>(X).z(u).z(y).(νw)((νx)([<sup>y</sup> <sup>↔</sup> <sup>x</sup>] <sup>|</sup> <sup>x</sup>X.[<sup>x</sup> <sup>↔</sup> <sup>w</sup>]) <sup>|</sup> <sup>w</sup>v.([<sup>u</sup> <sup>↔</sup> <sup>v</sup>] <sup>|</sup> [<sup>w</sup> <sup>↔</sup> <sup>z</sup>])) unfold<sup>z</sup> <sup>z</sup>(X).z(u).z(x).zX.zy.([<sup>u</sup> <sup>↔</sup> <sup>y</sup>] <sup>|</sup> [<sup>x</sup> <sup>↔</sup> <sup>z</sup>])

We can then show universality of the two constructions. We write Px,y to single out that x and y are free in P and Pz,w to denote the result of employing capture-avoiding substitution on P, substituting x and y by z and w. Let:

foldP(A)y1,y<sup>2</sup> (νx)(fold<sup>x</sup> <sup>|</sup> <sup>x</sup>A.xv.(uy.[<sup>y</sup> <sup>↔</sup> <sup>v</sup>] <sup>|</sup> <sup>x</sup>z.([<sup>z</sup> <sup>↔</sup> <sup>y</sup>1] <sup>|</sup> [<sup>x</sup> <sup>↔</sup> <sup>y</sup>2]))) unfoldP(A)y1,y<sup>2</sup> (νx)(unfold<sup>x</sup> <sup>|</sup> <sup>x</sup>A.xv.(uy.[<sup>y</sup> <sup>↔</sup> <sup>v</sup>] <sup>|</sup> <sup>x</sup>z.([<sup>z</sup> <sup>↔</sup> <sup>y</sup>1] <sup>|</sup> [<sup>x</sup> <sup>↔</sup> <sup>y</sup>2])))

where foldP(A)<sup>y</sup>1,y<sup>2</sup> corresponds to the application of fold to an F-algebra A with the associated morphism F(A) - A available on the shared channel u, consuming an ambient session y1:T<sup>i</sup> and offering y2:A. Similarly, unfoldP(A)<sup>y</sup>1,y<sup>2</sup> corresponds to the application of unfold to an F-coalgebra A with the associated morphism A - F(A) available on the shared channel u, consuming an ambient session y1:A and offering y2:T<sup>f</sup> .

**Theorem 4.1 (Universality of** foldP**).** <sup>∀</sup><sup>Q</sup> *such that* <sup>X</sup>; <sup>u</sup>:F(X) - <sup>X</sup>; <sup>y</sup>1:T<sup>i</sup> Q :: y2:X *we have* X; u:F(X) -<sup>X</sup>; <sup>y</sup>1:T<sup>i</sup> <sup>Q</sup> <sup>≈</sup><sup>L</sup> foldP(X)<sup>y</sup>1,y<sup>2</sup> :: <sup>y</sup>2:<sup>X</sup>

**Theorem 4.2 (Universality of** unfoldP**).** <sup>∀</sup><sup>Q</sup> *and* <sup>F</sup>*-coalgebra* <sup>A</sup> *s.t* ·; ·; <sup>y</sup>1:<sup>A</sup> <sup>Q</sup> :: <sup>y</sup>2:T<sup>f</sup> *we have that* ·; <sup>u</sup>:F(A) -<sup>A</sup>; <sup>y</sup>1:<sup>A</sup> <sup>Q</sup> <sup>≈</sup><sup>L</sup> unfoldP(A)<sup>y</sup>1,y<sup>2</sup> :: <sup>y</sup>2::T<sup>f</sup> *.*

*Example 4.3 (Natural Numbers).* We show how to represent the natural numbers as an inductive session type using <sup>F</sup>(X) = **<sup>1</sup>** <sup>⊕</sup> <sup>X</sup>, making use of in:

$$\mathtt{zero}\_x \triangleq (\nu z)(z.\mathtt{in};\mathtt{0} \mid \|\mathtt{in}(z)\|\_{x}) \quad \mathtt{succ}\_{y,x} \triangleq (\nu s)(s.\mathtt{in};[y \mapsto s] \mid \|\mathtt{in}(s)\|\_{x})$$

with Nat <sup>∀</sup>X.!((**<sup>1</sup>** <sup>⊕</sup> <sup>X</sup>) - X) - <sup>X</sup> where zero<sup>x</sup> :: <sup>x</sup>:Nat and <sup>y</sup>:Nat succy,x :: x:Nat encode the representation of 0 and successor, respectively. The natural 1 would thus be represented by one<sup>x</sup> (νy)(zero<sup>y</sup> <sup>|</sup> succy,x). The behaviour of type Nat can be seen as a that of a sequence of internal choices of arbitrary (but finite) length. We can then observe that the foldP process acts as a recursor. For instance consider:

$$\mathsf{stepDec}\_d \triangleq d(n).n.\mathsf{case}(\mathsf{zero}\_d, [n \gets d]) \quad \mathsf{dec}\_{\mathbb{Z}, z} \triangleq (\nu u)(!u(d).\mathsf{stepDec}\_d \mid \mathsf{foldP}(\mathsf{Nat})\_{x, z})$$

with stepDec<sup>d</sup> :: <sup>d</sup>:(**<sup>1</sup>** <sup>⊕</sup> Nat) - Nat and <sup>x</sup>:Nat decx,z :: <sup>z</sup>:Nat, where dec decrements a given natural number session on channel x. We have that:

(νx)(one<sup>x</sup> <sup>|</sup> decx,z) <sup>≡</sup> (νx, y.u)(zero<sup>y</sup> <sup>|</sup> succy,x!u(d).stepDec<sup>d</sup> <sup>|</sup> foldP(Nat)x,z) <sup>≈</sup><sup>L</sup> zero<sup>z</sup>

We note that the resulting encoding is reminiscent of the encoding of lists of [30] (where zero is the empty list and succ the cons cell). The main differences in the encodings arise due to our primitive notions of labels and forwarding, as well as due to the generic nature of in and fold.

*Example 4.4 (Streams).* We build on Example 4.3 by representing *streams* of natural numbers as a coinductive session type. We encode infinite streams of naturals with <sup>F</sup>(X) = Nat <sup>⊗</sup> <sup>X</sup>. Thus: NatStream <sup>∃</sup>X.!(<sup>X</sup> - (Nat <sup>⊗</sup> <sup>X</sup>)) <sup>⊗</sup> <sup>X</sup>. The behaviour of a session of type NatStream amounts to an infinite sequence of outputs of channels of type Nat. Such an encoding enables us to construct the stream of all naturals nats (and the stream of all non-zero naturals oneNats):

$$\begin{array}{llll} \mathsf{gen}\mathsf{HdNext}\_{z} \triangleq z(n).\overline{z}\langle y\rangle.\langle\overline{n}\langle n'\rangle.[n'\leftrightarrow y]\mid !z(w).\overline{n}\langle n'\rangle.\mathsf{succ}\_{n',w})\\ \mathsf{nats}\_{y} & \triangleq (\nu x,u)(\mathsf{zero}\_{x}\mid !u(z).\mathsf{gen}\mathsf{HdNext}\_{z}\mid \mathsf{unfoldP}(\mathsf{!Nat})\_{x,y})\\ \mathsf{oneNats}\_{y} & \triangleq (\nu x,u)(\mathsf{one}\_{x}\mid !u(z).\mathsf{gen}\mathsf{HdNext}\_{z}\mid \mathsf{unfoldP}(\mathsf{!Nat})\_{x,y}) \end{array}$$

with genHdNext<sup>z</sup> :: z:!Nat - Nat⊗!Nat and both nats<sup>y</sup> and oneNats :: y:NatStream. genHdNext<sup>z</sup> consists of a helper that generates the current head of a stream and the next element. As expected, the following process implements a session that "unrolls" the stream once, providing the head of the stream and then behaving as the rest of the stream (recall that out : T<sup>f</sup> -F(T<sup>f</sup> )).

$$(\nu x)(\mathtt{nats}\_x \mid \[\mathtt{out}(x)\]\_y) :: y \mathtt{N} \mathtt{at} \otimes \mathtt{Mat} \mathtt{Strea} \mathtt{m}$$

We note a peculiarity of the interaction of linearity with the stream encoding: a process that begins to deconstruct a stream has no way of "bottoming out" and stopping. One cannot, for instance, extract the first element of a stream of naturals and stop unrolling the stream in a well-typed way. We can, however, easily encode a "terminating" stream of all natural numbers via <sup>F</sup>(X)=(Nat⊗!X) by replacing the genHdNext<sup>z</sup> with the generator given as:

$$\mathsf{genHdNextText}\_{z} \triangleq z(n).\overline{z}\langle y\rangle.\langle \overline{n}\langle n'\rangle.[n'\leftrightarrow y] \mid !z\langle w\rangle.!w(w').\overline{n}\langle n'\rangle.\mathsf{succc}\_{n',w'}\rangle$$

It is then easy to see that a usage of out(x)<sup>y</sup> results in a session of type Nat⊗!NatStream, enabling us to discard the stream as needed. One can replay this argument with the operator <sup>F</sup>(X) = (!Nat <sup>⊗</sup> <sup>X</sup>) to enable discarding of stream elements. Assuming such modifications, we can then show:

(νy)((νx)(nats<sup>x</sup> <sup>|</sup> out(x)y) <sup>|</sup> <sup>y</sup>(n).[<sup>y</sup> <sup>↔</sup> <sup>z</sup>]) <sup>≈</sup>L oneNats<sup>z</sup> :: <sup>z</sup>:NatStream

### **4.2 Communicating Values – Sess***πλ*

We now consider a session calculus extended with a data layer obtained from a λ-calculus (whose terms are ranged over by M,N and types by τ,σ). We dub this calculus Sessπλ.

$$\begin{array}{lcl} P,Q & ::= \cdots \mid x\langle M\rangle.P \mid x(y).P & & A,B \quad ::= \cdots \mid \tau \wedge A \mid \tau \supset A\\ M,N & ::= \lambda x \colon \tau.M \mid MN \mid x & & \tau,\sigma \quad ::= \cdots \mid \tau \to \sigma \end{array}$$

Without loss of generality, we consider the data layer to be simply-typed, with a call-by-name semantics, satisfying the usual type safety properties. The typing judgment for this calculus is <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> . We omit session polymorphism for the sake of conciseness, restricting processes to communication of data and (session) channels. The typing judgment for processes is thus modified to <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:A, where <sup>Ψ</sup> is an intuitionistic context that accounts for variables in the data layer. The rules for the relevant process constructs are (all other rules simply propagate the Ψ context from conclusion to premises):

<sup>Ψ</sup> <sup>M</sup> : τ Ψ; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>z</sup>-<sup>M</sup>.P :: <sup>z</sup>:<sup>τ</sup> <sup>∧</sup> <sup>A</sup> (∧R) Ψ,y:<sup>τ</sup> ; <sup>Γ</sup>; Δ, x:<sup>A</sup> <sup>Q</sup> :: <sup>z</sup>:<sup>C</sup> <sup>Ψ</sup>; <sup>Γ</sup>; Δ, x:<sup>τ</sup> <sup>∧</sup> <sup>A</sup> <sup>x</sup>(y).Q :: <sup>z</sup>:<sup>C</sup> (∧L) Ψ,x:<sup>τ</sup> ; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>z</sup>(x).P :: <sup>z</sup>:<sup>τ</sup> <sup>⊃</sup> <sup>A</sup> (⊃R) <sup>Ψ</sup> <sup>M</sup> : τ Ψ; <sup>Γ</sup>; Δ, x:<sup>A</sup> <sup>Q</sup> :: <sup>z</sup>:<sup>C</sup> <sup>Ψ</sup>; <sup>Γ</sup>; Δ, x:<sup>τ</sup> <sup>⊃</sup> <sup>A</sup> <sup>x</sup>-<sup>M</sup>.Q :: <sup>z</sup>:<sup>C</sup> (⊃L)

With the reduction rule given by:<sup>1</sup> <sup>x</sup>-<sup>M</sup>.P <sup>|</sup> <sup>x</sup>(y).Q −→ <sup>P</sup> <sup>|</sup> <sup>Q</sup>{M/y}. With a simple extension to our encodings we may eliminate the data layer by encoding the data objects as processes, showing that from an expressiveness point of view, data communication is orthogonal to the framework. We note that the data language we are considering is *not* linear, and the usage discipline of data in processes is itself also not linear.

**To First-Order Processes.** We now introduce our encoding for Sessπλ, defined inductively on session types, processes, types and λ-terms (we omit the purely inductive cases on session types and processes for conciseness). As before, the encoding on processes is defined on *typing derivations*, where we indicate the typing rule at the root of the typing derivation.

$$\begin{array}{llll} \left[\tau \wedge A\right] \triangleq !\left[\tau\right] \otimes \left[A\right] & \left[\tau \supset A\right] \triangleq !\left[\tau\right] \xrightarrow{\cong} !\left[\tau\right] \multimap \left[A\right] & \left[\tau \to \sigma\right] \stackrel{\cong}{=} !\left[\tau\right] \multimap \left[\sigma\right] \\\\ \left(\wedge\mathsf{R}\right) \left[z\langle M\rangle.P\right] \triangleq \left[z\langle x\rangle.\langle !x\langle y.\llbracket M\rrbracket\_{y}\,\vert \,\vert \,\!\left[P\right]\rangle\left(\wedge\mathsf{L}\right) & \left[x\langle y\rangle.P\right] \triangleq x\langle y\rangle.\left[\!\left[P\right]\right] \\\\ \left(\sqangle\mathsf{R}\right) \left[z\langle x\rangle.P\right] \triangleq z\langle x\rangle.\left[\!\left[P\right]\right] & \left(\small\simeq \mathsf{L}\right) \left[\left[x\langle M\rangle.P\right] \triangleq \mp \left[y\right].\left(\left[\!\left[y\langle w\rangle.\left[\!\left[M\right]\]\rrbracket\right]\;\!\left[P\right]\right) \end{array}\right.$$

<sup>1</sup> For simplicity, in this section, we define the process semantics through a reduction relation.

$$\begin{array}{ll} \{x\}\_z \stackrel{\scriptstyle \Delta}{=} \overline{x}\langle y\rangle.[y \leftrightarrow z] & [\lambda x : \tau.M]\_z \stackrel{\scriptstyle \Delta}{=} z\langle x\rangle.[M]\_z\\ [M\ N]\_z \stackrel{\scriptstyle \Delta}{=} (\nu y)(\lbrackM\rVert\_y \mid \overline{y}\langle x\rangle.(\lbrackx(w).[N]\rVert\_w \mid [y \leftrightarrow z])) \end{array}$$

The encoding addresses the non-linear usage of data elements in processes by encoding the types <sup>τ</sup> <sup>∧</sup> <sup>A</sup> and <sup>τ</sup> <sup>⊃</sup> <sup>A</sup> as !<sup>τ</sup> <sup>⊗</sup> -A and !τ - -A, respectively. Thus, sending and receiving of data is codified as the sending and receiving of channels of type !, which therefore can be used non-linearly. Moreover, since data terms are themselves non-linear, the <sup>τ</sup> <sup>→</sup> <sup>σ</sup> type is encoded as !τ - σ, following Girard's embedding of intuitionistic logic in linear logic [15].

At the level of processes, offering a session of type <sup>τ</sup> <sup>∧</sup> <sup>A</sup> (i.e. a process of the form <sup>z</sup>-<sup>M</sup>.P) is encoded according to the translation of the type: we first send a *fresh* name x which will be used to access the encoding of the term M. Since M can be used an arbitrary number of times by the receiver, we guard the encoding of M with a replicated input, proceeding with the encoding of P accordingly. Using a session of type <sup>τ</sup> <sup>⊃</sup> <sup>A</sup> follows the same principle. The input cases (and the rest of the process constructs) are completely homomorphic.

The encoding of λ-terms follows Girard's decomposition of the intuitionistic function space [49]. The λ-abstraction is translated as input. Since variables in a λ-abstraction may be used non-linearly, the case for variables and application is slightly more intricate: to encode the application M N we compose M in parallel with a process that will send the "reference" to the function argument N which will be encoded using replication, in order to handle the potential for 0 or more usages of variables in a function body. Respectively, a variable is encoded by performing an output to trigger the replication and forwarding accordingly. Without loss of generality, we assume variable names and their corresponding replicated counterparts match, which can be achieved through αconversion before applying the translation. We exemplify our encoding as follows:

$$\begin{aligned} \{ [z(x).z \langle x \rangle.z \langle (\lambda y.\sigma.x) \rangle.\mathbf{0} \} &= z(x).\overline{z} \langle w \rangle. (\!\langle !w(u). \!\langle x \rangle \!\!\/]\_u \mid \!\!\/ z \langle v \rangle. (\!\langle !v(i). \!\langle \!\lambda y.\sigma.x \rangle \!\rangle \!\/ \mathbf{0}) \} \\ &= z(x).\overline{z} \langle w \rangle. (\!\langle !w(u). \overline{x} \langle y \rangle. \!\langle y \rangle. \!\langle y \leftrightarrow u \rangle \mid \!\!\/ \overline{z} \langle v \rangle. (\!\langle !v(i). i(y). \!\overline{x} \langle t \rangle. \!\langle t \leftrightarrow i \rangle \mid \!\mathbf{0}) \} \end{aligned}$$

**Properties of the Encoding.** We discuss the correctness of our encoding. We can straightforwardly establish that the encoding preserves typing.

To show that our encoding is operationally sound and complete, we capture the interaction between substitution on λ-terms and the encoding into processes through logical equivalence. Consider the following reduction of a process:

$$\begin{aligned} (\nu z)(z(x).z \langle x \rangle.z \langle (\lambda y. \sigma.x) \rangle. \mathbf{0} \mid z \langle \lambda w. \tau\_0. w \rangle. P) \\ \rightarrow (\nu z)(z \langle \lambda w. \tau\_0. w \rangle. z \langle (\lambda y. \sigma.\lambda w. \tau\_0. w \rangle). \mathbf{0} \mid P) \end{aligned} \tag{1}$$

Given that substitution in the target session π-calculus amounts to renaming, whereas in the λ-calculus we replace a variable for a term, the relationship between the encoding of a substitution <sup>M</sup>{N/x} and the encodings of <sup>M</sup> and N corresponds to the composition of the encoding of M with that of N, but where the encoding of N is guarded by a replication, codifying a form of explicit non-linear substitution.

**Lemma 4.5 (Compositionality).** *Let* Ψ,x:<sup>τ</sup> <sup>M</sup> : <sup>σ</sup> *and* <sup>Ψ</sup> <sup>N</sup> : <sup>τ</sup> *. We have that* -<sup>M</sup>{N/x}<sup>z</sup> <sup>≈</sup>L (νx)(-<sup>M</sup><sup>z</sup> <sup>|</sup>!x(y).-Ny)

Revisiting the process to the left of the arrow in Eq. 1 we have:


whereas the process to the right of the arrow is encoded as:


While the reduction of the encoded process and the encoding of the reduct differ syntactically, they are observationally equivalent – the latter inlines the replicated process behaviour that is accessible in the former on x. Having characterised substitution, we establish operational correspondence for the encoding.

### **Theorem 4.6 (Operational Correspondence)**

*1. If* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *and* -<sup>M</sup><sup>z</sup> −→ <sup>Q</sup> *then* <sup>M</sup> −→<sup>+</sup> <sup>N</sup> *such that* -<sup>N</sup><sup>z</sup> <sup>≈</sup>L <sup>Q</sup> *2. If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* -<sup>P</sup> −→ <sup>Q</sup> *then* <sup>P</sup> −→<sup>+</sup> <sup>P</sup> *such that* -P <sup>≈</sup>L <sup>Q</sup> *3. If* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *and* <sup>M</sup> −→ <sup>N</sup> *then* -<sup>M</sup><sup>z</sup> <sup>=</sup><sup>⇒</sup> <sup>P</sup> *such that* <sup>P</sup> <sup>≈</sup>L -N<sup>z</sup> *4. If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* <sup>P</sup> −→ <sup>Q</sup> *then* -<sup>P</sup> −→<sup>+</sup> <sup>R</sup> *with* <sup>R</sup> <sup>≈</sup>L -Q

The process equivalence in Theorem 4.6 above need not be extended to account for data (although it would be relatively simple to do so), since the processes in the image of the encoding are fully erased of any data elements.

**Back to** λ**-Terms.** We extend our encoding of processes to λ-terms to Sessπλ. Our extended translation maps processes to linear λ-terms, with the session type <sup>τ</sup> <sup>∧</sup> <sup>A</sup> interpreted as a pair type where the first component is replicated. Dually, <sup>τ</sup> <sup>⊃</sup> <sup>A</sup> is interpreted as a function type where the domain type is replicated. The remaining session constructs are translated as in *§* **3.2**.

<sup>τ</sup> <sup>∧</sup> <sup>A</sup> !<sup>τ</sup> <sup>⊗</sup> A <sup>τ</sup> <sup>⊃</sup> <sup>A</sup> !<sup>τ</sup> - A <sup>τ</sup> <sup>→</sup> <sup>σ</sup> !<sup>τ</sup> σ (∧L) x(y).P let <sup>y</sup> <sup>⊗</sup> <sup>x</sup> <sup>=</sup> <sup>x</sup> in let!<sup>y</sup> <sup>=</sup> <sup>y</sup> in P (∧R) zM.P !M <sup>⊗</sup> P (⊃R) x(y).P λx:!τ.let!<sup>x</sup> <sup>=</sup> <sup>x</sup> in P (⊃L) xM.P P{(<sup>x</sup> !M)/x} λx:τ.M λx:!τ .let!x = x in M M N M!N x x

The treatment of non-linear components of processes is identical to our previous encoding: non-linear functions <sup>τ</sup> <sup>→</sup> <sup>σ</sup> are translated to linear functions of type !τ <sup>σ</sup>; a process offering a session of type <sup>τ</sup> <sup>∧</sup> <sup>A</sup> (i.e. a process of the form z-<sup>M</sup>.P, typed by rule <sup>∧</sup>R) is translated to a pair where the first component is the encoding of M prefixed with ! so that it may be used non-linearly, and the second is the encoding of P. Non-linear variables are handled at the respective binding sites: a process using a session of type <sup>τ</sup> <sup>∧</sup> <sup>A</sup> is encoded using the elimination form for the pair and the elimination form for the exponential; similarly, a process offering a session of type <sup>τ</sup> <sup>⊃</sup> <sup>A</sup> is encoded as a <sup>λ</sup>-abstraction where the bound variable is of type !τ . Thus, we use the elimination form for the exponential, ensuring that the typing is correct. We illustrate our encoding:

z(x).zx.z(λy:σ.x).**0** <sup>=</sup> λx:!τ.let!<sup>x</sup> <sup>=</sup> <sup>x</sup> in !<sup>x</sup> ⊗ !λy:σ.x ⊗ <sup>=</sup> λx:!τ.let!<sup>x</sup> <sup>=</sup> <sup>x</sup> in !<sup>x</sup> ⊗ !(λy:!σ.let!<sup>y</sup> <sup>=</sup> <sup>y</sup> in <sup>x</sup>) ⊗

**Properties of the Encoding.** Unsurprisingly due to the logical correspondence between natural deduction and sequent calculus presentations of logic, our encoding satisfies both type soundness and operational correspondence (c.f. Theorems 3.6, 3.8 and 3.10). The full development can be found in [52].

**Relating the Two Encodings.** We prove the two encodings are mutually inverse and preserve the full abstraction properties (we write =<sup>β</sup> and =βη for βand βη-equivalence, respectively).

**Theorem 4.7 (Inverse).** *If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then* -P<sup>z</sup> <sup>≈</sup>L -P*. Also, if* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *then* -Mz =<sup>β</sup> M*.*

The equivalences above are formulated between the composition of the encodings applied to P (resp. M) and the process (resp. λ-term) *after* applying the translation embedding the non-linear components into their linear counterparts. This formulation matches more closely that of *§* **3.3**, which applies to linear calculi for which the *target* languages of this section are a strict subset (and avoids the formalisation of process equivalence with terms). We also note that in this setting, observational equivalence and βη-equivalence coincide [3,31]. Moreover, the extensional flavour of <sup>≈</sup>L includes <sup>η</sup>-like principles at the process level.

**Theorem 4.8.** *Let* · <sup>M</sup> : <sup>τ</sup> *and* · <sup>N</sup> : <sup>τ</sup> *.* M <sup>=</sup>βη N *iff* -<sup>M</sup><sup>z</sup> <sup>≈</sup>L -Nz*. Also, let* · <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* · <sup>Q</sup> :: <sup>z</sup>:A*. We have that* -<sup>P</sup> <sup>≈</sup>L -Q *iff* P =βη Q*.*

We establish full abstraction for the encoding of λ-terms into processes (Theorem 4.8) in two steps: The completeness direction (i.e. from left-to-right) follows from operational completeness and strong normalisation of the λ-calculus. The soundness direction uses operational soundness. The proof of Theorem 4.8 uses the same strategy of Theorem 3.14, appealing to the inverse theorems.

### **4.3 Higher-Order Session Processes – Sess***πλ***<sup>+</sup>**

We extend the value-passing framework of the previous section, accounting for process-passing (i.e. the higher-order) in a session-typed setting. As shown in [50], we achieve this by adding to the data layer a *contextual monad* that encapsulates (open) session-typed processes as data values, with a corresponding elimination form in the process layer. We dub this calculus Sessπλ<sup>+</sup>.

$$\begin{array}{ll} P,Q \ \mathrel{\vbox{\hbox{ $::$ }}} \mathrel{\vbox{\hbox{ $::$ }}} \mathrel{\hbox{ $\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \dots \mathrel{\vbox{\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \mathrel{\vbox{\hbox{$ :: $}}} \dots \mathrel{\vbox{\hbox{$ :: $}}} \dots \mathrel{\vbox{\hbox{$ :: $}}} \dots \mathrel{\vbox{\hbox{$ }}} \begin{array}{ll} M.N \ \mathrel{\vbox{\hbox{ $::$ }}} \dots \mathrel{\vbox{\hbox{ $::$ }}} \dots \mathrel{\vbox{\hbox{ $}}} \begin{array}{ll} M.N \ \mathrel{\vbox{\hbox{$ :: $}}} \dots \mathrel{\vbox{\hbox{$ :: $}}} \dots \mathrel{\hbox{$ ~}} \begin{array}{ll} \{x \gets P \leftarrow \overline{y\_{i} \mathrel{\vbox{\hbox{ $::$ }}}}\} \end{array} \end{array} \end{array}$$

The type {x<sup>j</sup> :A<sup>j</sup> <sup>z</sup>:A} is the type of a term which encapsulates an open process that uses the linear channels x<sup>j</sup> :A<sup>j</sup> and offers A along channel z. This formulation has the added benefit of formalising the integration of session-typed processes in a functional language and forms the basis for the concurrent programming language SILL [37,50]. The typing rules for the new constructs are (for simplicity we assume no shared channels in process monads):

$$\frac{\Psi; ; ; \overline{x\_i : A\_i} \vdash P :: z : A}{\Psi \vdash \{z \leftarrow P \leftarrow \overline{x\_i : A\_i}\} : ; \{\overline{x\_i : A\_i} \vdash z : A\}} \tag{1}$$

$$\frac{\Psi \vdash M : \{\overline{x\_i : A\_i} \vdash x : A\} \quad \Delta\_1 = \overline{y\_i : A\_i} \quad \Psi; \Gamma; \Delta\_2, x : A \vdash Q :: z : C}{\Psi; \Gamma; \Delta\_1, \Delta\_2 \vdash x \leftarrow M \leftarrow \overline{y\_i}; Q :: z : C} \tag{1}$$

Rule {}<sup>I</sup> embeds processes in the term language by essentially quoting an open process that is well-typed according to the type specification in the monadic type. Dually, rule {}<sup>E</sup> allows for processes to use monadic values through composition that *consumes* some of the ambient channels in order to provide the monadic term with the necessary context (according to its type). These constructs are discussed in substantial detail in [50]. The reduction semantics of the process construct is given by (we tacitly assume that the names y and c do not occur in P and omit the congruence case):

$$(c \leftarrow \{z \leftarrow P \leftarrow \overline{x\_i:A\_i}\} \leftarrow \overline{y\_i};Q) \rightarrow (\nu c)(P\{\overline{y}/\overline{x\_i}\{c/z\}\} \mid Q)$$

The semantics allows for the underlying monadic term M to evaluate to a (quoted) process P. The process P is then executed in parallel with the continuation Q, sharing the linear channel c for subsequent interactions. We illustrate the higher-order extension with following typed process (we write {<sup>x</sup> <sup>←</sup> <sup>P</sup>} when <sup>P</sup> does not depend on any linear channels and assume <sup>Q</sup> :: <sup>d</sup>:Nat <sup>∧</sup> **<sup>1</sup>**):

$$P \triangleq (\nu c)(c \langle \{d \gets Q\} \rangle.c(x).\mathbf{0} \mid c(y).d \gets y; d(n).c \langle n \rangle.\mathbf{0})\tag{2}$$

Process P above gives an abstract view of a communication idiom where a process (the left-hand side of the parallel composition) sends another process Q which potentially encapsulates some complex computation. The receiver then *spawns* the execution of the received process and inputs from it a result value that is sent back to the original sender. An execution of P is given by:

$$\begin{array}{c} P \rightarrow (\nu c)(c(x).\mathbf{0} \mid d \leftarrow \{d \leftarrow Q\}; d(n).c\langle n\rangle.\mathbf{0}) \rightarrow (\nu c)(c(x).\mathbf{0} \mid (\nu d)(Q \mid d(n).c\langle n\rangle.\mathbf{0})) \\ \rightarrow ^+(\nu c)(c(x).\mathbf{0} \mid c\langle 42\rangle.\mathbf{0}) \rightarrow \mathbf{0} \end{array}$$

Given the seminal work of Sangiorgi [46], such a representation naturally begs the question of whether or not we can develop a *typed* encoding of higher-order processes into the first-order setting. Indeed, we can achieve such an encoding with a fairly simple extension of the encoding of *§* **4.2** to Sessπλ<sup>+</sup> by observing that monadic values are processes that need to be potentially provided with extra sessions in order to be executed correctly. For instance, a term of type {x:<sup>A</sup> <sup>y</sup>:B} denotes a process that given a session <sup>x</sup> of type <sup>A</sup> will then offer y:B. Exploiting this observation we encode this type as the session A - B, ensuring subsequent usages of such a term are consistent with this interpretation.

$$\begin{array}{l} \left\| \{ \overline{x\_{j}:A\_{j}} \vdash z:A \} \right\| \quad \triangleq \begin{array}{l} \left\| \overline{A\_{j}} \right\| \ \neg \left\| A \right\| \\ \hline \end{array} \\ \left\| \{ x \leftarrow P \rightarrow \overline{y\_{i}} \} \right\| \; \triangleq \; z(y\_{0}) \dots z(y\_{n}) . \left\| P \{ z/x \} \right\| \quad (z \notin fn(P)) \\ \left\| x \leftarrow M \leftarrow \overline{y\_{i}}; Q \right\| \triangleq \left( \nu x \right) (\left\| M \right\|\_{x} \mid \overline{x} \langle a\_{0} \rangle . ([a\_{0} \leftrightarrow y\_{0}] \mid \cdot \cdot \mid x \langle a\_{n} \rangle . ([a\_{n} \leftrightarrow y\_{n}] \mid [\![Q] \} \ldots .))) \end{array}$$

To encode the monadic type {x<sup>j</sup> :A<sup>j</sup> <sup>z</sup>:A}, denoting the type of process <sup>P</sup> that is typed by <sup>x</sup><sup>j</sup> :A<sup>j</sup> <sup>P</sup> :: <sup>z</sup>:A, we require that the session in the image of the translation specifies a sequence of channel inputs with behaviours A<sup>j</sup> that make up the linear context. After the contextual aspects of the type are encoded, the session will then offer the (encoded) behaviour of A. Thus, the encoding of the monadic type is -A0 - ... - -An - -A, which we write as -A<sup>j</sup> - -A. The encoding of monadic expressions adheres to this behaviour, first performing the necessary sequence of inputs and then proceeding inductively. Finally, the encoding of the elimination form for monadic expressions behaves dually, composing the encoding of the monadic expression with a sequence of outputs that instantiate the consumed names accordingly (via forwarding). The encoding of process P from Eq. 2 is thus:

$$\begin{array}{l} \{P\} = (\nu c)(\lceil c\langle\{d \gets Q\}\rangle.c(x).\mathbf{0}) \mid \lceil c(y).d \gets y; d(n).c\langle n\rangle.\mathbf{0})\} \\ = (\nu c)(\overline{c}\langle w\rangle.(\lvert w(d).\lbrack Q\}\mid c(x).\mathbf{0})c(y).\langle \nu d\rangle(\overline{y}\langle b\rangle.\lbrack b \gets d\vert \mid d(n).\overline{c}\langle m\rangle.\langle \overline{n}\langle e\rangle.\lbrack e \gets m\vert \mid \mathbf{0}\rangle)\} \end{array}$$

**Properties of the Encoding.** As in our previous development, we can show that our encoding for Sessπλ<sup>+</sup> is type sound and satisfies operational correspondence. The full development is omitted but can be found in [52].

We encode Sessπλ<sup>+</sup> into <sup>λ</sup>-terms, extending *§* **4.2** with:

$$\begin{array}{l} \{\{\overline{x\_{i}:A\_{i}}\vdash z:A\}\} \triangleq \overline{\{A\_{i}\}} \multimap \{A\} \\\{x \leftarrow M \leftarrow \overline{y\_{i}}; Q\} \triangleq \{Q\} \{ (\{M\}\,\overline{y\_{i}})/x\} \end{array} \quad \{\{x \leftarrow P \leftarrow \overline{w\_{i}}\}\} \triangleq \lambda w\_{0} \dots \lambda w\_{n}. \{P\} \end{array}$$

The encoding translates the monadic type {xi:A<sup>i</sup> <sup>z</sup>:A} as a linear function Ai - A, which captures the fact that the underlying value must be provided with terms satisfying the requirements of the linear context. At the level of terms, the encoding for the monadic term constructor follows its type specification, generating a nesting of λ-abstractions that closes the term and proceeding inductively. For the process encoding, we translate the monadic application construct analogously to the translation of a linear cut, but applying the appropriate variables to the translated monadic term (which is of function type). We remark the similarity between our encoding and that of the previous section, where monadic terms are translated to a sequence of inputs (here a nesting of λ-abstractions). Our encoding satisfies type soundness and operational correspondence, as usual. Further showcasing the applications of our development, we obtain a novel strong normalisation result for this higher-order session-calculus "for free", through encoding to the λ-calculus.

**Theorem 4.9 (Strong Normalisation).** *Let* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:A*. There is no infinite reduction sequence starting from* P*.*

**Theorem 4.10 (Inverse Encodings).** *If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then* -P<sup>z</sup> <sup>≈</sup>L -<sup>P</sup>*. Also, if* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *then* -Mz =<sup>β</sup> M*.*

**Theorem 4.11.** *Let* <sup>M</sup> : <sup>τ</sup> *,* <sup>N</sup> : <sup>τ</sup> *,* <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* <sup>Q</sup> :: <sup>z</sup>:A*.* M <sup>=</sup>βη N *iff* -<sup>M</sup><sup>z</sup> <sup>≈</sup>L -N<sup>z</sup> *and* -<sup>P</sup> <sup>≈</sup>L -Q *iff* P =βη Q*.*

### **5 Related Work and Concluding Remarks**

**Process Encodings of Functions.** Toninho et al. [49] study encodings of the simply-typed λ-calculus in a logically motivated session π-calculus, via encodings to the linear λ-calculus. Our work differs since they do not study polymorphism nor reverse encodings; and we provide deeper insights through applications of the encodings. Full abstraction or inverse properties are not studied.

Sangiorgi [43] uses a fully abstract compilation from the higher-order πcalculus (HOπ) to the π-calculus to study full abstraction for Milner's encodings of the λ-calculus. The work shows that Milner's encoding of the lazy λ-calculus can be recovered by restricting the semantic domain of processes (the so-called *restrictive* approach) or by enriching the λ-calculus with suitable constants. This work was later refined in [45], which does not use HOπ and considers an operational equivalence on λ-terms called *open applicative bisimulation* which coincides with L´evy-Longo tree equality. The work [47] studies general conditions under which encodings of the λ-calculus in the π-calculus are fully abstract wrt L´evy-Longo and B¨ohm Trees, which are then applied to several encodings of (callby-name) λ-calculus. The works above deal with *untyped calculi*, and so reverse encodings are unfeasible. In a broader sense, our approach takes the restrictive approach using linear logic-based session typing and the induced observational equivalence. We use a λ-calculus with booleans as observables and reason with a Morris-style equivalence instead of tree equalities. It would be an interesting future work to apply the conditions in [47] in our typed setting.

Wadler [54] shows a correspondence between a linear functional language with session types GV and a session-typed process calculus with polymorphism based on classical linear logic CP. Along the lines of this work, Lindley and Morris [26], in an exploration of inductive and coinductive session types through the addition of least and greatest fixed points to CP and GV, develop an encoding from a linear λ-calculus with session primitives (Concurrent μGV) to a pure linear λ-calculus (Functional μGV) via a CPS transformation. They also develop translations between μCP and Concurrent μGV, extending [25]. Mapping to the terminology used in our work [17], their encodings are shown to be operationally complete, but no results are shown for the operational soundness directions and neither full abstraction nor inverse properties are studied. In addition, their operational characterisations do not compose across encodings. For instance, while strong normalisation of Functional μGV implies the same property for Concurrent μGV through their operationally complete encoding, the encoding from μCP to μGV does not necessarily preserve this property.

Types for π-calculi delineate sequential behaviours by restricting composition and name usages, limiting the contexts in which processes can interact. Therefore typed equivalences offer a *coarser* semantics than untyped semantics. Berger et al. [5] study an encoding of System F in a polymorphic linear π-calculus, showing it to be fully abstract based on game semantics techniques. Their typing system and proofs are more complex due to the fine-grained constraints from game semantics. Moreover, they do not study a reverse encoding. Orchard and Yoshida [33] develop embeddings to-and-from PCF with parallel effects and a sessiontyped π-calculus, but only develop operational correspondence and semantic soundness results, leaving the full abstraction problem open.

**Polymorphism and Typed Behavioural Semantics.** The work of [7] studies parametric session polymorphism for the intuitionistic setting, developing a behavioural equivalence that captures parametricity, which is used (denoted as <sup>≈</sup>L) in our paper. The work [39] introduces a typed bisimilarity for polymorphism in the π-calculus. Their bisimilarity is of an intensional flavour, whereas the one used in our work follows the extensional style of Reynolds [41]. Their typing discipline (originally from [53], which also develops type-preserving encodings of polymorphic λ-calculus into polymorphic π-calculus) differs significantly from the linear logic-based session typing of our work (e.g. theirs does not ensure deadlock-freedom). A key observation in their work is the coarser nature of typed equivalences with polymorphism (in analogy to those for IO-subtyping [38]) and their interaction with channel aliasing, suggesting a use of typed semantics and encodings of the π-calculus for fine-grained analyses of program behaviour.

**F-Algebras and Linear-F.** The use of initial and final (co)algebras to give a semantics to inductive and coinductive types dates back to Mendler [28], with their strong definability in System F appearing in [1,19]. The definability of inductive and coinductive types using parametricity also appears in [40] in the context of a logic for parametric polymorphism and later in [6] in a linear variant of such a logic. The work of [55] studies parametricity for the polymorphic linear λ-calculus of this work, developing encodings of a few inductive types but not the initial (or final) algebraic encodings in their full generality. Inductive and coinductive session types in a logical process setting appear in [26,51]. Both works consider a calculus with built-in recursion – the former in an intuitionistic setting where a process that offers a (co)inductive protocol is composed with another that consumes the (co)inductive protocol and the latter in a classical framework where composed recursive session types are dual each other.

**Conclusion and Future Work.** This work answers the question of what kind of type discipline of the π-calculus can exactly capture and is captured by λcalculus behaviours. Our answer is given by showing the first mutually inverse and fully abstract encodings between two calculi with polymorphism, one being the Polyπ session calculus based on intuitionistic linear logic, and the other (a linear) System F. This further demonstrates that the linear logic-based articulation of name-passing interactions originally proposed by [8] (and studied extensively thereafter e.g. [7,9,25,36,50,51,54]) provides a clear and applicable tool for message-passing concurrency. By exploiting the proof theoretic equivalences between natural deduction and sequent calculus we develop mutually inverse and fully abstract encodings, which naturally extend to more intricate settings such as process passing (in the sense of HOπ). Our encodings also enable us to derive properties of the π-calculi "for free". Specifically, we show how to obtain adequate representations of least and greatest fixed points in Polyπ through the encoding of initial and final (co)algebras in the λ-calculus. We also straightforwardly derive a strong normalisation result for the higher-order session calculus, which otherwise involves non-trivial proof techniques [5,7,12,13,36]. Future work includes extensions to the classical linear logic-based framework, including multiparty session types [10,11]. Encodings of session π-calculi to the λ-calculus have been used to implement session primitives in functional languages such as Haskell (see a recent survey [32]), OCaml [24,34,35] and Scala [48]. Following this line of work, we plan to develop encoding-based implementations of this work as embedded DSLs. This would potentially enable an exploration of algebraic constructs beyond initial and final co-algebras in a session programming setting. In particular, we wish to further study the meaning of functors, natural transformations and related constructions in a session-typed setting, both from a more fundamental viewpoint but also in terms of programming patterns.

**Acknowledgements.** The authors thank Viviana Bono, Dominic Orchard and the reviewers for their comments, suggestions and pointers to related works. This work is partially supported by EPSRC EP/K034413/1, EP/K011715/1, EP/L00058X/1, EP/N027833/1, EP/N028201/1 and NOVA LINCS (UID/CEC/04516/2013).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Concurrent Kleene Algebra: Free Model and Completeness**

Tobias Kapp´e(B), Paul Brunet, Alexandra Silva, and Fabio Zanasi

University College London, London, UK tkappe@cs.ucl.ac.uk

**Abstract.** Concurrent Kleene Algebra (CKA) was introduced by Hoare, Moeller, Struth and Wehrman in 2009 as a framework to reason about concurrent programs. We prove that the axioms for CKA with bounded parallelism are complete for the semantics proposed in the original paper; consequently, these semantics are the free model for this fragment. This result settles a conjecture of Hoare and collaborators. Moreover, the technique developed to this end allows us to establish a Kleene Theorem for CKA, extending an earlier Kleene Theorem for a fragment of CKA.

### **1 Introduction**

Concurrent Kleene Algebra (CKA) [8] is a mathematical formalism which extends Kleene Algebra (KA) with a parallel composition operator, in order to express concurrent program behaviour.<sup>1</sup> In spite of such a seemingly simple addition, extending the existing KA toolkit (notably, completeness) to the setting of CKA turned out to be a challenging task. A lot of research happened since the original paper, both foundational [13,20] and on how CKA could be used to reason about important verification tasks in concurrent systems [9,11]. However, and despite several conjectures [9,13], the question of the characterisation of the free CKA and the completeness of the axioms remained open, making it impractical to use CKA in verification tasks. This paper settles these two open questions. We answer positively the conjecture that the free model of CKA is formed by series parallel pomset languages, downward-closed under Gischer's subsumption order [6]—a generalisation of regular languages to sets of partially ordered words. To this end, we prove that the original axioms proposed in [8] are indeed complete.

Our proof of completeness is based on extending an existing completeness result that establishes series-parallel rational pomset languages as the free Bi-Kleene Algebra (BKA) [20]. The extension to the existing result for BKA provides a clear understanding of the difficulties introduced by the presence of the exchange axiom and shows how to separate concerns between CKA and BKA, a technique which is also useful elsewhere. For one, our construction also provides

c The Author(s) 2018

<sup>1</sup> In its original formulation, CKA also features an operator (*parallel star*) for unbounded parallelism: in harmony with several recent works [13,14], we study the variant of CKA without parallel star, sometimes called "weak" CKA.

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 856–882, 2018. https://doi.org/10.1007/978-3-319-89884-1\_30

an extension of (half of) Kleene's theorem for BKA [14] to CKA, establishing pomset automata as an operational model for CKA and opening the door to decidability procedures similar to those previously studied for KA. Furthermore, it reduces deciding the equational theory of CKA to deciding the equational theory of BKA.

BKA is defined as CKA with the only (but significant) omission of the *exchange law*, (e f) · (g h) -CKA (e · g) - (f · h). The exchange law is the core element of CKA as it softens true concurrency: it states that when two sequentially composed programs (i.e., e · g and f · h) are composed in parallel, they can be implemented by running their heads in parallel, followed by running their tails in parallel (i.e., e f, then g h). The exchange law allows the implementer of a CKA expression to interleave threads at will, without violating the specification.

To illustrate the use of the exchange law, consider a protocol with three actions: query a channel c, collect an answer from the same channel, and print an unrelated message m on screen. The specification for this protocol requires the query to happen before reception of the message, but the printing action being independent, it may be executed concurrently. We will write this specification as (q(c) · r(c)) p(m), with the operator · denoting sequential composition. However, if one wants to implement this protocol in a sequential programming language, a total ordering of these events has to be introduced. Suppose we choose to implement this protocol by printing m while we wait to receive an answer. This implementation can be written q(c) · p(m) · r(c). Using the laws of CKA, we can prove that q(c) · p(m) · r(c) -CKA (q(c) · r(c)) p(m), which we interpret as the fact that this implementation respects the specification. Intuitively, this means that the specification lists the necessary dependencies, but the implementation can introduce more.

Having a complete axiomatisation of CKA has two main benefits. First, it allows one to get certificates of correctness. Indeed, if one wants to use CKA for program verification, the decision procedure presented in [3] may be used to test program equivalence. If the test gives a negative answer, this algorithm provides a counter-example. However if the answer is positive, no meaningful witness is produced. With the completeness result presented here, that is constructive in nature, one could generate an axiomatic proof of equivalence in these cases. Second, it gives one a simple way of checking when the aforementioned procedure applies. By construction, we know that two terms are semantically equivalent whenever they are equal in every concurrent Kleene algebra, that is any model of the axioms of CKA. This means that if we consider a specific semantic domain, one simply needs to check that the axioms of CKA hold in there to know that the decision procedure of [3] is sound in this model.

While this paper was in writing, a manuscript with the same result appeared [19]. Among other things, the proof presented here is different in that it explicitly shows how to syntactically construct terms that express certain pomset languages, as opposed to showing that such terms must exist by reasoning on a semantic level. We refer to Sect. 5 for a more extensive comparison.

The remainder of this paper is organised as follows. In Sect. 2, we give an informal overview of the completeness proof. In Sect. 3, we introduce the necessary concepts, notation and lemmas. In Sect. 4, we work out the proof. We discuss the result in a broader perspective and outline further work in Sect. 5.

### **2 Overview of the Completeness Proof**

We start with an overview of the steps necessary to arrive at the main result. As mentioned, our strategy in tackling CKA-completeness is to build on the existing BKA-completeness result. Following an observation by Laurence and Struth, we identify *downward-closure* (under Gischer's subsumption order [6]) as the feature that distinguishes the pomsets giving semantics to BKA-expressions from those associated with CKA-expressions. In a slogan,

CKA-semantics = BKA-semantics + downward-closure.

This situation is depicted in the upper part of the commuting diagram in Fig. 1. Intuitively, downward-closure can be thought of as the semantic outcome of adding the exchange axiom, which distinguishes CKA from BKA. Thus, if a and b are events that can happen in parallel according to the BKA-semantics of a term, then a and b may also be ordered in the CKA-semantics of that same term.

**Fig. 1.** The connection between BKA and CKA semantics mediated by closure.

The core of our CKA-completeness proof will be to construct a syntactic counterpart to the semantic closure. Concretely, we shall build a function that maps a CKA term e to an equivalent term e↓, called the (syntactic) *closure* of e. The lower part of the commuting diagram in Fig. 1 shows the property that e↓ must satisfy in order to deserve the name of closure: its BKA semantics has to be the same as the CKA semantics of e.

*Example 2.1.* Consider e = a b, whose CKA-semantics prescribe that a and b are events that may happen in parallel. One closure of this term would be e↓ = a b+a·b+b·a, whose BKA-semantics stipulate that either a and b execute purely in parallel, or a precedes b, or b precedes a—thus matching the optional parallelism of a and b. For a more non-trivial example, take e = a- b-, which represents that finitely many repetitions of a and b occur, possibly in parallel. A closure of this term would be e↓ = (a- b-) - : finitely many repetitions of a and b occur truly in parallel, which is repeated indefinitely.

In order to find e↓ systematically, we are going to construct it in stages, through a completely syntactic procedure where each transformation has to be valid according to the axioms. There are three main stages.


As a straightforward consequence of the closure construction, we obtain a completeness theorem for CKA, which establishes the set of closed series-rational pomset languages as the free CKA.

### **3 Preliminaries**

We fix a finite set of symbols Σ, the *alphabet*. We use the symbols a, b and c to denote elements of Σ. The two-element set {0, 1} is denoted by 2. Given a set S, the set of subsets (*powerset*) of S is denoted by 2S.

In the interest of readability, the proofs for technical lemmas in this section can be found in the full version [15].

### **3.1 Pomsets**

A trace of a sequential program can be modelled as a word, where each letter represents an atomic event, and the order of the letters in the word represents the order in which the events took place. Analogously, a trace of a concurrent program can be thought of as word where letters are partially ordered, i.e., there need not be a causal link between events. In literature, such a partially ordered word is commonly called a *partial word* [7], or *partially ordered multiset* (*pomset*, for short) [6]; we use the latter term.

A formal definition of pomsets requires some work, because the partial order should order *occurrences* of events rather than the events themselves. For this reason, we first define a labelled poset.

**Definition 3.1.** *A* labelled poset *is a tuple* S, ≤, λ*, where* S, ≤ *is a partially ordered set (i.e.,* S *is a set and* ≤ *is a partial order on* S*), in which* S *is called the* carrier *and* ≤ *is the* order*;* λ : S → Σ *is a function called the* labelling*.*

We denote labelled posets with lower-case bold symbols **u**, **v**, et cetera. Given a labelled poset **u**, we write S**<sup>u</sup>** for its carrier, ≤**<sup>u</sup>** for its order and λ**<sup>u</sup>** for its labelling. We write **1** for the empty labelled poset. We say that two labelled posets are *disjoint* if their carriers are disjoint.

Disjoint labelled posets can be composed parallelly and sequentially; parallel composition simply juxtaposes the events, while sequential composition imposes an ordering between occurrences of events originating from the left operand and those originating from the right operand.

**Definition 3.2.** *Let* **u** *and* **v** *be disjoint. We write* **u v** *for the* parallel composition *of* **u** *and* **v***, which is the labelled poset with the carrier* S**<sup>u</sup>**∪**<sup>v</sup>** = S**<sup>u</sup>** ∪S**v***, the order* ≤**<sup>u</sup><sup>v</sup>** = ≤**<sup>u</sup>** ∪ ≤**<sup>v</sup>** *and the labeling* λ**<sup>u</sup><sup>v</sup>** *defined by*

$$
\lambda\_{\mathbf{u}\parallel\mathbf{v}}(x) = \begin{cases}
\lambda\_{\mathbf{u}}(x) & x \in S\_{\mathbf{u}}; \\
\lambda\_{\mathbf{v}}(x) & x \in S\_{\mathbf{v}}.
\end{cases}
$$

*Similarly, we write* **u** · **v** *for the* sequential composition *of* **u** *and* **v***, that is, labelled poset with the carrier* S**<sup>u</sup>**∪**<sup>v</sup>** *and the partial order*

$$
\leq\_{\mathbf{u}\cdot\mathbf{v}} = \leq\_{\mathbf{u}} \cup \leq\_{\mathbf{v}} \cup (S\_{\mathbf{u}} \times S\_{\mathbf{v}}),
$$

*as well as the labelling* λ**<sup>u</sup>**·**<sup>v</sup>** = λ**<sup>u</sup><sup>v</sup>***.*

Note that **1** is neutral for sequential and parallel composition, in the sense that we have **1 u** = **1** · **u** = **u** = **u** · **1** = **u** -**1**.

There is a natural ordering between labelled posets with regard to concurrency.

**Definition 3.3.** *Let* **u**, **v** *be labelled posets. A* subsumption *from* **u** *to* **v** *is a bijection* h : S**<sup>u</sup>** → S**<sup>v</sup>** *that preserves order and labels, i.e.,* u ≤**<sup>u</sup>** u *implies that* h(u) ≤**<sup>v</sup>** h(u )*, and* λ**<sup>v</sup>** ◦h = λ**u***. We simplify and write* h : **u** → **v** *for a subsumption from* **u** *to* **v***. If such a subsumption exists, we write* **v u***. Furthermore,* h *is an* isomorphism *if both* h *and its inverse* h−<sup>1</sup> *are subsumptions. If there exists an isomorphism from* **u** *to* **v** *we write* **u** ∼= **v***.*

Intuitively, if **u v**, then **u** and **v** both order the same set of (occurrences of) events, but **u** has more causal links, or "is more sequential" than **v**. One easily sees that is a preorder on labelled posets of finite carrier.

Since the actual contents of the carrier of a labelled poset do not matter, we can abstract from them using isomorphism. This gives rise to pomsets.

**Definition 3.4.** *A* pomset *is an isomorphism class of labelled posets, i.e., the class* [**v**] {**u** : **u** ∼= **v**} *for some labelled poset* **v***. Composition lifts to pomsets: we write* [**u**] - [**v**] *for* [**u v**] *and* [**u**] · [**v**] *for* [**u** · **v**]*. Similarly, subsumption also lifts to pomsets: we write* [**u**] [**v**]*, precisely when* **u v***.*

We denote pomsets with upper-case symbols U, V , et cetera. The *empty pomset*, i.e., [**1**] = {**1**}, is denoted by 1; this pomset is neutral for sequential and parallel composition. To ensure that [**v**] is a set, we limit the discussion to labelled posets whose carrier is a subset of some set S. The labelled posets in this paper have finite carrier; it thus suffices to choose S = N to represent all pomsets with finite (or even countably infinite) carrier.

Composition of pomsets is well-defined: if **u** and **v** are not disjoint, we can find **u** , **v** disjoint from **u**, **v** respectively such that **u** ∼= **u** and **v** ∼= **v** . The choice of representative does not matter, for if **u** ∼= **u** and **v** ∼= **v** , then **u** · **v** ∼= **u** · **v** . Subsumption of pomsets is also well-defined: if **u** ∼= **u v** ∼= **v** , then **u v** . One easily sees that is a partial order on finite pomsets, and that sequential and parallel composition are monotone with respect to , i.e., if U W and V X, then U · V W · X and U - V W - X. Lastly, we note that both types of composition are associative, both on the level of pomsets and labelled posets; we therefore omit parentheses when no ambiguity is likely.

**Series-Parallel Pomsets.** If a ∈ Σ, we can construct a labelled poset with a single element labelled by a; indeed, since any labelled poset thus constructed is isomorphic, we also use a to denote this isomorphism class; such a pomset is called a *primitive pomset*. A pomset built from primitive pomsets and sequential and parallel composition is called *series-parallel*; more formally:

**Definition 3.5.** *The set of* series-parallel *pomsets, denoted* SP(Σ)*, is the smallest set such that* 1 ∈ SP(Σ) *as well as* a ∈ SP(Σ) *for every* a ∈ Σ*, and is closed under parallel and sequential composition.*

We elide the sequential composition operator when we explicitly construct a pomset from primitive pomsets, i.e., we write ab instead of a · b for the pomset obtained by sequentially composing the (primitive) pomsets a and b. In this notation, sequential composition takes precedence over parallel composition.

All pomsets encountered in this paper are series-parallel. A useful feature of series-parallel pomsets is that we can deconstruct them in a standard fashion [6].

**Lemma 3.1.** *Let* U ∈ SP(Σ)*. Then* exactly one *of the following is true: either (i)* U = 1*, or (ii)* U = a *for some* a ∈ Σ*, or (iii)* U = U<sup>0</sup> · U<sup>1</sup> *for* U0, U<sup>1</sup> ∈ SP(Σ) \ {1}*, or (iv)* U = U<sup>0</sup> -U<sup>1</sup> *for* U0, U<sup>1</sup> ∈ SP(Σ) \ {1}*.*

In the sequel, it will be useful to refer to pomsets that are *not* of the third kind above, i.e., cannot be written as U<sup>0</sup> · U<sup>1</sup> for U0, U<sup>1</sup> ∈ SP(Σ) \ {1}, as *nonsequential* pomsets. Lemma 3.1 gives a normal form for series-parallel pomsets, as follows.

**Corollary 3.1.** *A pomset* U ∈ SP(Σ) *can be uniquely decomposed as* U = U<sup>0</sup> · <sup>U</sup><sup>1</sup> ···Un−<sup>1</sup>*, where for all* <sup>0</sup> <sup>≤</sup> i<n*,* <sup>U</sup>i *is series parallel and non-sequential.*

**Factorisation.** We now go over some lemmas on pomsets that will allow us to factorise pomsets later on. First of all, one easily shows that subsumption is irrelevant on empty and primitive pomsets, as witnessed by the following lemma.

**Lemma 3.2.** *Let* U *and* V *be pomsets such that* U V *or* V U*. If* U *is empty or primitive, then* U = V *.*

We can also consider how pomset composition and subsumption relate. It is not hard to see that if a pomset is subsumed by a sequentially composed pomset, then this sequential composition also appears in the subsumed pomset. A similar statement holds for pomsets that subsume a parallel composition.

**Lemma 3.3 (Factorisation).** *Let* U*,* V0*, and* V<sup>1</sup> *be pomsets such that* U *is subsumed by* V<sup>0</sup> · V1*. Then there exist pomsets* U<sup>0</sup> *and* U<sup>1</sup> *such that:*

U = U<sup>0</sup> · U1, U<sup>0</sup> V0, and U<sup>1</sup> V1.

*Also, if* U0*,* U<sup>1</sup> *and* V *are pomsets such that* U<sup>0</sup> - U<sup>1</sup> V *, then there exist pomsets* V<sup>0</sup> *and* V<sup>1</sup> *such that:*

$$V = V\_0 \parallel V\_1, \ U\_0 \sqsubseteq V\_0, \text{ and } U\_1 \sqsubseteq V\_1.$$

The next lemma can be thought of as a generalisation of Levi's lemma [21], a well-known statement about words, to pomsets. It says that if a sequential composition is subsumed by another (possibly longer) sequential composition, then there must be a pomset "in the middle", describing the overlap between the two; this pomset gives rise to a factorisation.

**Lemma 3.4.** *Let* <sup>U</sup> *and* <sup>V</sup> *be pomsets, and let* <sup>W</sup>0, W1,...,Wn−<sup>1</sup> *with* n > <sup>0</sup> *be non-empty pomsets such that* <sup>U</sup> · <sup>V</sup> <sup>W</sup><sup>0</sup> · <sup>W</sup><sup>1</sup> ··· <sup>W</sup>n−1*. There exists an* m<n *and pomsets* Y,Z *such that:*

$$Y \cdot Z \subseteq W\_m, \\ U \subseteq W\_0 \cdot W\_1 \cdots W\_{m-1} \cdot Y, \text{ and } V \subseteq Z \cdot W\_{m+1} \cdot W\_{m+2} \cdots W\_n.$$

*Moreover, if* U *and* V *are series-parallel, then so are* Y *and* Z*.*

Levi's lemma also has an analogue for parallel composition.

**Lemma 3.5.** *Let* U, V, W, X *be pomsets such that* U - V = W - X*. There exist pomsets* Y0, Y1, Z0, Z<sup>1</sup> *such that*

$$U = Y\_0 \parallel Y\_1, \; V = Z\_0 \parallel Z\_1, \; W = Y\_0 \parallel Z\_0, \; and \; X = Y\_1 \parallel Z\_1.$$

The final lemma is useful when we have a sequentially composed pomset subsumed by a parallelly composed pomset. It tells us that we can factor the involved pomsets to find subsumptions between smaller pomsets. This lemma first appeared in [6], where it is called the interpolation lemma.

**Lemma 3.6 (Interpolation).** *Let* U, V, W, X *be pomsets such that* U · V *is subsumed by* W -X*. Then there exist pomsets* W0, W1, X0, X<sup>1</sup> *such that*

$$W\_0 \cdot W\_1 \subseteq W, \, X\_0 \cdot X\_1 \subseteq X, \, U \subseteq W\_0 \parallel X\_0, \, and \, V \subseteq W\_1 \parallel X\_1.$$

*Moreover, if* W *and* X *are series-parallel, then so are* W0*,* W1*,* X<sup>0</sup> *and* X1*.*

On a semi-formal level, the interpolation lemma can be understood as follows. If U · V W - X, then the events in W are partitioned between those that end up in U, and those that end up in V ; these give rise to the "sub-pomsets" W<sup>0</sup> and W<sup>1</sup> of W, respectively. Similarly, X partitions into "sub-pomsets" X<sup>0</sup> and X1. We refer to Fig. 2 for a graphical depiction of this situation.

Now, if y precedes z in W<sup>0</sup> - X0, then y must precede z in W - X, and therefore also in U · V . Since y and z are both events in U, it then follows that y precedes z in U, establishing that U W<sup>0</sup> - X0. Furthermore, if y precedes z in W, then we can exclude the case where y is in W<sup>1</sup> and z in W0, for then z precedes y in U · V , contradicting that y precedes z in U · V . Accordingly, either y and z both belong to W<sup>0</sup> or W1, or y is in W<sup>0</sup> while z is in W1; in all of these cases, y must precede z in W<sup>0</sup> · W1. The other subsumptions hold analogously.

**Fig. 2.** Splitting pomsets in the interpolation lemma

**Pomset Languages.** The semantics of BKA and CKA are given in terms of sets of series-parallel pomsets.

**Definition 3.6.** *A subset of* SP(Σ) *is referred to as a* pomset language*.*

As a convention, we denote pomset languages by the symbols U, V, et cetera. Sequential and parallel composition of pomsets extends to pomset languages in a pointwise manner, i.e.,

$$\mathcal{U} \cdot \mathcal{V} \stackrel{\ominus}{=} \{ U \cdot V : U \in \mathcal{U}, V \in \mathcal{V} \}$$

and similarly for parallel composition. Like languages of words, pomset languages have a Kleene star operator, which is similarly defined, i.e., U- n∈<sup>N</sup> <sup>U</sup> <sup>n</sup>, where the <sup>n</sup>th power of <sup>U</sup> is inductively defined as <sup>U</sup><sup>0</sup> {1} and <sup>U</sup> <sup>n</sup>+1 <sup>U</sup> <sup>n</sup> · U.

A pomset language U is *closed under subsumption* (or simply *closed*) if whenever U ∈ U with U U and U ∈ SP(Σ), it holds that U ∈ U. The *closure under subsumption* (or simply *closure*) of a pomset language U, denoted U↓, is defined as the smallest pomset language that contains U and is closed, i.e.,

$$\mathcal{U} \upharpoonright \underline{\mathcal{C}} \upharpoonright \{U' \in \mathsf{SP}(\Sigma) : \exists U \in \mathcal{U}. \ U' \sqsubseteq U\},$$

Closure relates to union, sequential composition and iteration as follows.

**Lemma 3.7.** *Let* U, V *be pomset languages; then:*

(U∪V)↓ = U↓ ∪ V↓, (U·V)↓ = U↓ · V↓, and U-↓ = U↓-.

*Proof.* The first claim holds for infinite unions, too, and follows immediately from the definition of closure.

For the second claim, suppose that U ∈ U and V ∈ V, and that W U · V . By Lemma 3.3, we find pomsets W<sup>0</sup> and W<sup>1</sup> such that W = W<sup>0</sup> · W1, with W<sup>0</sup> U and W<sup>1</sup> V . It then holds that W<sup>0</sup> ∈ U↓ and W<sup>1</sup> ∈ V↓, meaning that W = W<sup>0</sup> · W<sup>1</sup> ∈ U↓·V↓. This shows that (U·V)↓ U↓·V↓. Proving the reverse inclusion is a simple matter of unfolding the definitions.

For the third claim, we can calculate directly using the first and second parts of this lemma:

$$\mathcal{U}^{\star}\downarrow = \left(\bigcup\_{n\in\mathbb{N}}\underbrace{\mathcal{U}\cdot\mathcal{U}\cdots\mathcal{U}}\_{n\text{ times}}\right)\downarrow = \bigcup\_{n\in\mathbb{N}}\left(\underbrace{\mathcal{U}\cdot\mathcal{U}\cdots\mathcal{U}}\_{n\text{ times}}\right)\downarrow = \bigcup\_{n\in\mathbb{N}}\underbrace{\mathcal{U}\downarrow\mathcal{U}\downarrow\cdots\mathcal{U}}\_{n\text{ times}}\underbrace{\mathcal{U}^{\star}}\_{n\text{ times}} = \mathcal{U}\downarrow^{\star}$$

#### **3.2 Concurrent Kleene Algebra**

We now consider two extensions of Kleene Algebra (KA), known as *Bi-Kleene Algebra* (BKA) and *Concurrent Kleene Algebra* (CKA). Both extend KA with an operator for parallel composition and thus share a common syntax.

**Definition 3.7.** *The set* T *is the smallest set generated by the grammar*

e, f ::= 0 | 1 | a ∈ Σ | e + f | e · f | e f | e-

The BKA-semantics of a term is a straightforward inductive application of the operators on the level of pomset languages. The CKA-semantics of a term is the BKA-semantics, downward-closed under the subsumption order; the CKAsemantics thus includes all possible sequentialisations.

**Definition 3.8.** *The function* -<sup>−</sup>BKA : T → <sup>2</sup>SP(Σ) *is defined as follows:*

$$\begin{aligned} \left[\mathbb{I}\right]\_{\mathsf{BK}} & \stackrel{\scriptstyle \Delta}{=} \emptyset \end{aligned} \qquad \qquad \left[e+f\right]\_{\mathsf{BK}} \stackrel{\scriptstyle \Delta}{=} \left[e\right]\_{\mathsf{BK}} \cup \left[f\right]\_{\mathsf{BK}} \qquad \left[e^{\star}\right]\_{\mathsf{BK}} \stackrel{\scriptstyle \Delta}{=} \left[e\right]\_{\mathsf{BK}}^{\star}$$
 
$$\left[1\right]\_{\mathsf{BK}} \stackrel{\scriptstyle \Delta}{=} \left\{1\right\} \qquad \left[e\cdot f\right]\_{\mathsf{BK}} \stackrel{\scriptstyle \Delta}{=} \left[e\right]\_{\mathsf{BK}} \cdot \left[f\right]\_{\mathsf{BK}}$$
 
$$\left[a\right]\_{\mathsf{BK}} \stackrel{\scriptstyle \Delta}{=} \left\{a\right\} \qquad \left[e\parallel f\right]\_{\mathsf{BK}} \stackrel{\scriptstyle \Delta}{=} \left[e\right]\_{\mathsf{BK}} \parallel \left[f\right]\_{\mathsf{BK}}$$

*Finally,* -<sup>−</sup>CKA : T → <sup>2</sup>SP(Σ) *is defined as* eCKA eBKA↓*.*

Following Lodaya and Weil [22], if U is a pomset language such that U = eBKA for some e ∈ T , we say that the language U is *series-rational*. Note that if U is such that U = eCKA for some term e ∈ T , then U is closed by definition.

To axiomatise semantic equivalence between terms, we build the following relations, which match the axioms proposed in [20]. The axioms of CKA as defined in [8] come from a double quantale structure mediated by the exchange law; these imply the ones given here. The converse implication does not hold; in particular, our syntax does not include an infinitary greatest lower bound operator. However, BKA (as defined in this paper) does have a *finitary* greatest lower bound [20], and by the existence of closure, so does CKA.

**Definition 3.9.** *The relation* ≡BKA *is the smallest congruence on* T *(with respect to all operators) such that for all* e, f, g ∈ T *:*

$$e \cdot 0 \equiv\_{\mathsf{B\&A}} e \qquad e + e \equiv\_{\mathsf{B\&A}} e \qquad e + f \equiv\_{\mathsf{B\&A}} f + e \qquad e + (f + g) \equiv\_{\mathsf{B\&A}} (f + g) + h$$

$$e \cdot 1 \equiv\_{\mathsf{B\&A}} e \qquad 1 \cdot e \equiv\_{\mathsf{B\&A}} e \qquad e \cdot (f \cdot g) \equiv\_{\mathsf{B\&A}} (e \cdot f) \cdot g$$

$$e \cdot 0 \equiv\_{\mathsf{B\&A}} 0 \equiv\_{\mathsf{B\&A}} 0 \cdot e \qquad e \cdot (f + g) \equiv\_{\mathsf{B\&A}} e \cdot f + e \cdot h \qquad (e + f) \cdot g \equiv\_{\mathsf{B\&A}} e \cdot g + f \cdot g$$

$$e \parallel f \equiv\_{\mathsf{B\&A}} f \parallel e \qquad e \parallel 1 \equiv\_{\mathsf{B\&A}} e \qquad e \parallel (f \parallel g) \equiv\_{\mathsf{B\&A}} (e \parallel f) \parallel g$$

$$e \parallel 0 \equiv\_{\mathsf{B\&A}} 0 \qquad e \parallel (f + g) \equiv\_{\mathsf{B\&A}} e \parallel f + e \parallel g \qquad 1 + e \cdot e^{\star} \equiv\_{\mathsf{B\&A}} e^{\star}$$

$$e + f \cdot g \stackrel{\leq}{\leq}\_{\mathsf{B\&A}} g \implies f^{\star} \cdot e \stackrel{\leq}{\leq}\_{\mathsf{B\&A}} g$$

*in which we use* e -BKA f *as a shorthand for* e+f ≡BKA f*. The final (conditional) axiom is referred to as the* least fixpoint axiom*.*

*The relation* ≡CKA *is the smallest congruence on* T *that satisfies the rules of* ≡BKA*, and furthermore satisfies the* exchange law *for all* e, f, g, h ∈ T *:*

$$(e \parallel f) \cdot (g \parallel h) \leq\_{\mathsf{c\&A}} (e \cdot g) \parallel (f \cdot h)$$

*where we similarly use* e -CKA f *as a shorthand for* e + f ≡CKA f*.*

We can see that ≡BKA includes the familiar axioms of KA, and stipulates that is commutative and associative with unit 1 and annihilator 0, as well as distributive over +. When using CKA to model concurrent program flow, the exchange law models sequentialisation: if we have two programs, the first of which executes e followed by g, and the second of which executes f followed by h, then we can sequentialise this by executing e and f in parallel, followed by executing g and h in parallel.

We use the symbol T in statements that are true for T ∈ {BKA, CKA}. The relation ≡<sup>T</sup> is sound for equivalence of terms under T [13].

**Lemma 3.8.** *Let* e, f ∈ T *. If* e ≡<sup>T</sup> f*, then* e<sup>T</sup> = fT*.*

Since all binary operators are associative (up to ≡T), we drop parentheses when writing terms like e + f + g—this does not incur ambiguity with regard to -−T. We furthermore consider · to have precedence over -, which has precedence over +; as usual, the Kleene star has the highest precedence of all operators. For instance, when we write e + f · g- h, this should be read as e + ((f ·(g-)) h).

In case of BKA, the implication in Lemma 3.8 is an equivalence [20], and thus gives a complete axiomatisation of semantic BKA-equivalence of terms.<sup>2</sup>

**Theorem 3.1.** *Let* e, f ∈ T *. Then* e ≡BKA f *if and only if* eBKA = fBKA*.*

Given a term e ∈ T , we can determine syntactically whether its (BKA or CKA) semantics contains the empty pomset, using the function defined below.

<sup>2</sup> Strictly speaking, the proof in [20] includes the parallel star operator in BKA. Since this is a conservative extension of BKA, this proof applies to BKA as well.

**Definition 3.10.** *The* nullability function : T → 2 *is defined as follows:*


*in which* ∨ *and* ∧ *are understood as the usual lattice operations on* 2*.*

That encodes the presence of 1 in the semantics is witnessed by the following.

**Lemma 3.9.** *Let* e ∈ T *. Then* (e) -<sup>T</sup> e *and* 1 ∈ e<sup>T</sup> *if and only if* (e)=1*.*

In the sequel, we need the *(parallel) width* of a term. This is defined as follows.

**Definition 3.11.** *Let* e ∈ T *. The* (parallel) width *of* e*, denoted by* |e|*, is defined as* 0 *when* e ≡BKA 0*; for all other cases, it is defined inductively, as follows:*


The width of a term is invariant with respect to equivalence of terms.

**Lemma 3.10.** *Let* e, f ∈ T *. If* e ≡BKA f*, then* |e| = |f|*.*

The width of a term is related to its semantics as demonstrated below.

**Lemma 3.11.** *Let* e ∈ T *, and let* U ∈ eBKA *be such that* U = 1*. Then* |e| > 0*.*

#### **3.3 Linear Systems**

KA is equipped to find the least solutions to linear inequations. For instance, if we want to find X such that e · X + f -KA X, it is not hard to show that e- · f is the *least solution* for X, in the sense that this choice of X satisfies the inequation, and for any choice of X that also satisfies this inequation it holds that e- · f -KA X. Since KA is contained in BKA and CKA, the same constructions also apply there. These axioms generalise to systems of linear inequations in a straightforward manner; indeed, Kozen [18] exploited this generalisation to axiomatise KA. In this paper, we use systems of linear inequations to construct particular expressions. To do this, we introduce vectors and matrices of terms.

For the remainder of this section, we fix I as a finite set.

**Definition 3.12.** *An* I-vector *is a function from* I *to* T *. Addition of* I*-vectors is defined pointwise, i.e., if* p *and* q *are* I*-vectors, then* p + q *is the* I*-vector defined for* i ∈ I *by* (p + q)(i) p(i) + q(i)*.*

*An* <sup>I</sup>-matrix *is a function from* <sup>I</sup><sup>2</sup> *to* <sup>T</sup> *. Left-multiplication of an* <sup>I</sup>*-vector by an* I*-matrix is defined in the usual fashion, i.e., if* M *is an* I*-matrix and* p *is an* I*-vector, then* M · p *is the* I*-vector defined for* i ∈ I *by*

$$(M \cdot p)(i) \stackrel{\Delta}{=} \sum\_{j \in I} M(i, j) \cdot p(j)$$

Equivalence between terms extends pointwise to I-vectors. More precisely, we write p ≡<sup>T</sup> q for I-vectors p and q when p(i) ≡<sup>T</sup> q(i) for all i ∈ I, and p -<sup>T</sup> q when p + q ≡<sup>T</sup> q.

**Definition 3.13.** *An* I-linear system L *is a pair* M, p *where* M *is an* I*-matrix and* p *is an* I*-vector. A* solution *to* L *in* T *is an* I*-vector* s *such that* M·s+p -<sup>T</sup> s*. A* least solution *to* L *in* T *is a solution* s *in* T *such that for any solution* t *in* T *it holds that* s -<sup>T</sup> t*.*

It is not very hard to show that least solutions of a linear system are unique, up to ≡T; we therefore speak of *the* least solution of a linear system.

Interestingly, *any* I-linear system has a least solution, and one can construct this solution using only the operators of KA. The construction proceeds by induction on |I|. In the base, where I is empty, the solution is trivial; for the inductive step it suffices to reduce the problem to finding the least solution of a strictly smaller linear system. This construction is not unlike Kleene's procedure to obtain a regular expression from a finite automaton [17]. Alternatively, we can regard the existence of least solutions as a special case of Kozen's proof of the fixpoint for matrices over a KA, as seen in [18, Lemma 9].

As a matter of fact, because this construction uses the axioms of KA exclusively, the least solution that is constructed is the same for both BKA and CKA.

**Lemma 3.12.** *Let* L *be an* I*-linear system. One can construct a single* I*vector* x *that is the least solution to* L *in both* BKA *and* CKA*.*

We include a full proof of the lemma above using the notation of this paper in the full version of this paper [15].

### **4 Completeness of CKA**

We now turn our attention to proving that ≡CKA is complete for CKA-semantic equivalence of terms, i.e., that if e, f ∈ T are such that eCKA = fCKA, then e ≡CKA f. In the interest of readability, proofs of technical lemmas in this section can be found in the full version of this paper [15].

As mentioned before, our proof of completeness is based on the completeness result for BKA reproduced in Theorem 3.1. Recall that eCKA = eBKA↓. To reuse completeness of BKA, we construct a syntactic variant of the closure operator, which is formalised below.

**Definition 4.1.** *Let* e ∈ T *. We say that* e↓ *is a* closure *of* e *if both* e ≡CKA e↓ *and* e↓BKA = eBKA↓ *hold.*

*Example 4.1.* Let e = a b; as proposed in Sect. 2, we claim that e↓ = a - b+b·a+a·b is a closure of e. To see why, first note that e -CKA e↓ by construction. Furthermore,

> ab ≡CKA (a - 1) · (1 b) -CKA (a · 1) - (1 · b) ≡CKA a b

and similarly ba -CKA e; thus, e ≡CKA e↓. Lastly, the pomsets in eBKA↓ and e↓BKA are simply a b, ab and ba, and therefore e↓BKA = eBKA↓.

Laurence and Struth observed that the existence of a closure for every term implies a completeness theorem for CKA, as follows.

**Lemma 4.1.** *Suppose that we can construct a closure for every element of* T *. If* e, f ∈ T *such that* eCKA = fCKA*, then* e ≡CKA f*.*

*Proof.* Since eCKA = eBKA↓ = e↓BKA and similarly fCKA = f↓BKA, we have e↓BKA = f↓BKA. By Theorem 3.1, we get e↓ ≡BKA f↓, and thus e↓ ≡CKA f↓, since all axioms of BKA are also axioms of CKA. By e ≡CKA e↓ and f↓ ≡CKA f, we can then conclude that e ≡CKA f.

The remainder of this section is dedicated to showing that the premise of Lemma 4.1 holds. We do this by explicitly constructing a closure e↓ for every e ∈ T . First, we note that closure can be constructed for the base terms.

**Lemma 4.2.** *Let* e ∈ 2 *or* e = a *for some* a ∈ Σ*. Then* e *is a closure of itself.*

Furthermore, closure can be constructed compositionally for all operators except parallel composition, in the following sense.

**Lemma 4.3.** *Suppose that* e0, e<sup>1</sup> ∈ T *, and that* e<sup>0</sup> *and* e<sup>1</sup> *have closures* e0↓ *and* e1↓*. Then (i)* e0↓+e1↓ *is a closure of* e<sup>0</sup> +e1*, (ii)* e0↓ · e1↓ *is a closure of* e<sup>0</sup> · e1*, and (iii)* (e0↓) *is a closure of* e- 0*.*

*Proof.* Since e0↓ ≡CKA e<sup>0</sup> and e1↓ ≡CKA e1, by the fact that ≡CKA is a congruence we obtain e0↓ + e1↓ ≡CKA e<sup>0</sup> + e1. Similar observations hold for the other operators. We conclude using Lemma 3.7.

It remains to consider the case where e = e<sup>0</sup> e1. In doing so, our induction hypothesis is that any f ∈ T with |f| < |e<sup>0</sup> e1| has a closure, as well as any strict subterm of e<sup>0</sup> e1.

### **4.1 Preclosure**

To get to a closure of a parallel composition, we first need an operator on terms that is not a closure quite yet, but whose BKA-semantics is "closed enough" to cover the non-sequential elements of the CKA-semantics of the term.

**Definition 4.2.** *Let* e ∈ T *. A* preclosure *of* e *is a term* e˜ ∈ T *such that* e˜ ≡CKA e*. Moreover, if* U ∈ eCKA *is non-sequential, then* U ∈ e˜BKA*.*

*Example 4.2.* Suppose that e<sup>0</sup> e<sup>1</sup> = (a b) c. A preclosure of e<sup>0</sup> e<sup>1</sup> could be

$$\tilde{e} = a \parallel b \parallel c + (a \cdot b + b \cdot a) \parallel c + (b \cdot c + c \cdot b) \parallel a + (a \cdot c + c \cdot a) \parallel b$$

To verify this, note that e -CKA e˜ by construction; remains to show that ˜e -CKA e. This is fairly straightforward: since a·b+b·a -CKA a b, we have (a·b+b·a) c -CKA e; the other terms are treated similarly. Consequently, e ≡CKA e˜. Furthermore, there are seven non-sequential pomsets in eCKA; they are

a b c ab c ba c bc a cb a ac b ca b

Each of these pomsets is found in e˜BKA. It should be noted that ˜e is *not* a closure of e; to see this, consider for instance that abc ∈ eCKA, while abc ∈ e˜BKA.

The remainder of this section is dedicated to showing that, under the induction hypothesis, we can construct a preclosure for any parallelly composed term. This is not perfectly straightforward; for instance, consider the term e<sup>0</sup> e<sup>1</sup> discussed in Example 4.2. At first glance, one might be tempted to choose e0↓ e1↓ as a preclosure, since e0↓ and e1↓ exist by the induction hypothesis. In that case, e0↓ = a b+a · b+b · a is a closure of e0. Furthermore, e1↓ = c is a closure of e1, by Lemma 4.2. However, e0↓ e1↓ is not a preclosure of e<sup>0</sup> e1, since (a · c) b is non-sequential and found in e<sup>0</sup> e1CKA, but not in e0↓ e1↓BKA.

The problem is that the preclosure of e<sup>0</sup> and e<sup>1</sup> should also allow (partial) sequentialisation of *parallel parts* of e<sup>0</sup> and e1; in this case, we need to sequentialise the a part of a b with c, and leave b untouched. To do so, we need to be able to *split* e<sup>0</sup> e<sup>1</sup> into pairs of constituent terms, each of which represents a possible way to divvy up its parallel parts. For instance, we can split e<sup>0</sup> e<sup>1</sup> = (a b) c parallelly into a b and c, but also into a and b c, or into a c and b. The definition below formalises this procedure.

**Definition 4.3.** *Let* <sup>e</sup> ∈ T *;* <sup>Δ</sup>e *is the smallest relation on* <sup>T</sup> *such that*

$$\begin{array}{ccccc}\hline\hline\frac{1}{1}\,\Delta\_{e}\,e &\stackrel{\begin{subarray}{c}\ell\end{subarray}}{\ell\end{subarray}}{e\,\bigtriangleup\_{e}\,1} &\stackrel{\begin{subarray}{c}\ell\,\Delta\_{e\_{0}}\,r\\\ell\,\Delta\_{e\_{1}+e\_{0}}\,r\end{subarray}}{\ell\,\Delta\_{e\_{1}+e\_{0}}\,r} &\stackrel{\begin{subarray}{c}\ell\,\Delta\_{e\_{1}}\,r\\\ell\,\Delta\_{e\_{0}+e\_{1}}\,r\end{subarray}}{\ell\,\Delta\_{e\_{0}+e\_{1}}\,r} &\stackrel{\begin{subarray}{c}\ell\,\Delta\_{e\_{1}}\,r\\\ell\,\Delta\_{e\_{0}+e\_{1}}\,r\end{subarray}}{\ell\,\Delta\_{e\_{0}}\,r}\\\hline\hline\end{array}\\\begin{array}{ccccc}\begin{array}{ccccc}\ell\,\Delta\_{e}\,r\\\ell\,\Delta\_{e\_{1}}\,r\\\ell\,\Delta\_{e\_{0}}\,r\end{array}\\\&\begin{array}{c}\ell\,\Delta\_{e\_{1}}\,r\\\ell\,\Delta\_{e\_{0}}\,r\end{array}\\\hline\end{array}$$

Given <sup>e</sup> ∈ T , we refer to <sup>Δ</sup>e as the *parallel splitting relation* of <sup>e</sup>, and to the elements of <sup>Δ</sup>e as *parallel splices* of <sup>e</sup>. Before we can use <sup>Δ</sup>e to construct the preclosure of e, we go over a number of properties of the parallel splitting relation. The first of these properties is that a given e ∈ T has only finitely many parallel splices. This will be useful later, when we involve *all* parallel splices of e in building a new term, i.e., to guarantee that the constructed term is finite.

### **Lemma 4.4.** *For* <sup>e</sup> ∈ T *,* <sup>Δ</sup>e *is finite.*

We furthermore note that the parallel composition of any parallel splice of e is ordered below e by -BKA. This guarantees that parallel splices never contain extra information, i.e., that their semantics do not contain pomsets that do not occur in the semantics of e. It also allows us to bound the width of the parallel splices by the width of the term being split, as a result of Lemma 3.10.

#### **Lemma 4.5.** *Let* <sup>e</sup> ∈ T *. If* <sup>Δ</sup>e <sup>r</sup>*, then* r -BKA e*.*

**Corollary 4.1.** *Let* <sup>e</sup> ∈ T *. If* <sup>Δ</sup>e <sup>r</sup>*, then* <sup>|</sup><sup>|</sup> <sup>+</sup> <sup>|</sup>r|≤|e|*.*

Finally, we show that <sup>Δ</sup>e is *dense* when it comes to parallel pomsets, meaning that if we have a parallelly composed pomset in the semantics of e, then we can find a parallel splice where one parallel component is contained in the semantics of one side of the pair, and the other component in that of the other.

**Lemma 4.6.** *Let* e ∈ T *, and let* V,W *be pomsets such that* V - W ∈ eBKA*. Then there exist* , r ∈ T *with* <sup>Δ</sup>e <sup>r</sup> *such that* <sup>V</sup> <sup>∈</sup> -BKA *and* W ∈ rBKA*.*

*Proof.* The proof proceeds by induction on e. In the base, we can discount the case where e = 0, for then the claim holds vacuously. This leaves us two cases.


For the inductive step, there are four cases to consider.

	- Suppose that <sup>U</sup>i = 1 for some <sup>i</sup> <sup>∈</sup> 2, meaning that <sup>V</sup> - W = U<sup>0</sup> · U<sup>1</sup> = <sup>U</sup>1−i <sup>∈</sup> <sup>e</sup>1−iBKA for this <sup>i</sup>. By induction, we find , r ∈ T with <sup>Δ</sup>e1−<sup>i</sup> <sup>r</sup>, and V ∈ -BKA as well as W ∈ <sup>r</sup>BKA. Since <sup>U</sup>i = 1 <sup>∈</sup> <sup>e</sup>iBKA, we have that (ei) = 1 by Lemma 3.9, and thus <sup>Δ</sup>e <sup>r</sup>.
	- Suppose that V = 1 or W = 1. In the former case, V - W = W = U<sup>0</sup> · U<sup>1</sup> ∈ eCKA. We then choose = 1 and r = e to satisfy the claim. In the latter case, we can choose = e and r = 1 to satisfy the claim analogously.

If n > 0, we can assume without loss of generality that, for 0 ≤ i<n, it holds that <sup>U</sup>i = 1. By Lemma 3.1, there are two subcases to consider.

• Suppose that V,W = 1; then <sup>n</sup> = 1 (for otherwise <sup>U</sup>j = 1 for some 0 ≤ j<n by Lemma 3.1, which contradicts the above). Since V - W = U<sup>0</sup> ∈ <sup>e</sup>0BKA, we find by induction , r ∈ T with <sup>Δ</sup>e<sup>0</sup> <sup>r</sup> such that V ∈ -BKA and W ∈ <sup>r</sup>BKA. The claim then follows by the fact that <sup>Δ</sup>e <sup>r</sup>. • Suppose that V = 1 or W = 1. In the former case, V - W = W = <sup>U</sup><sup>0</sup> · <sup>U</sup><sup>1</sup> ···Un−<sup>1</sup> <sup>∈</sup> eCKA. We then choose = 1 and r = e to satisfy the claim. In the latter case, we can choose = e and r = 1 to satisfy the claim analogously.

*Example 4.3.* Let U = a c and V = b, and note that U - V ∈ e<sup>0</sup> e1CKA. We can then find that <sup>a</sup> <sup>Δ</sup>a 1 and 1 <sup>Δ</sup>b <sup>b</sup>, and thus <sup>a</sup> - <sup>1</sup> <sup>Δ</sup>e<sup>0</sup> <sup>1</sup> <sup>b</sup>. Since also <sup>c</sup> <sup>Δ</sup>c 1, it follows that (a - 1) <sup>c</sup> <sup>Δ</sup>e0e<sup>1</sup> (1 b) - 1. We can then choose = (a - 1) c and r = (1 b) - 1 to find that U ∈ -BKA and V ∈ <sup>r</sup>BKA, while <sup>Δ</sup>e0e<sup>1</sup> <sup>r</sup>.

With parallel splitting in hand, we can define an operator on terms that combines all parallel splices of a parallel composition in a way that accounts for all of their downward closures.

**Definition 4.4.** *Let* e, f ∈ T *, and suppose that, for every* g ∈ T *such that* |g| < |e| + |f|*, there exists a closure* g↓*. The term* e f *is defined as follows:*

$$e \odot f \stackrel{\Delta}{=} e \parallel f + \sum\_{\substack{\ell \Delta\_{e \parallel f} r \\ |\ell|, |r| < |e||f|}} \ell \downarrow \parallel r \downarrow$$

Note that <sup>e</sup><sup>f</sup> is well-defined: the sum is finite since <sup>Δ</sup>ef is finite by Lemma 4.4, and furthermore ↓ and r↓ exist, as we required that ||, |r| < |e f|.

*Example 4.4.* Let us compute e<sup>0</sup> e<sup>1</sup> and verify that we obtain a preclosure of e<sup>0</sup> <sup>e</sup>1. Working through the definition, we see that <sup>Δ</sup>e0e<sup>1</sup> consists of the pairs


Since closure is invariant with respect to ≡CKA, we can simplify these terms by applying the axioms of CKA. After folding the unit subterms, we are left with

$$\langle 1, a \parallel b \parallel c \rangle \qquad \langle c, a \parallel b \rangle \qquad \langle b, a \parallel c \rangle \qquad \langle b \parallel c, a \rangle \qquad \langle a, b \parallel c \rangle \qquad \langle a \parallel c, b \rangle$$

Recall that a b + a · b + b · a is a closure of a b. Now, we find that

$$\begin{aligned} e\_0 \odot e\_1 &= (a \parallel b) \parallel c + c \parallel (a \parallel b + a \cdot b + b \cdot a) \\ &+ b \parallel (a \parallel c + a \cdot c + c \cdot a) + (b \parallel c + b \cdot c + c \cdot b) \parallel a \\ &+ a \parallel (b \parallel c + b \cdot c + c \cdot b) + (a \parallel c + a \cdot c + c \cdot a) \parallel b \\ \equiv\_{c \ltimes a} a \parallel b \parallel c + a \parallel (b \cdot c + c \cdot b) + b \parallel (a \cdot c + c \cdot a) + c \parallel (a \cdot b + b \cdot a) \end{aligned}$$

which was shown to be a preclosure of e<sup>0</sup> e<sup>1</sup> in Example 4.2.

The general proof of correctness for as a preclosure plays out as follows.

**Lemma 4.7.** *Let* e, f ∈ T *, and suppose that, for every* g ∈ T *with* |g| < |e|+|f|*, there exists a closure* g↓*. Then* e f *is a preclosure of* e f*.*

*Proof.* We start by showing that ef ≡CKA e f. First, note that e f -BKA ef by definition of ef. For the other direction, suppose that , r ∈ T are such that <sup>Δ</sup>ef <sup>r</sup>. By definition of closure, we know that ↓ r↓ ≡CKA r. By Lemma 4.5, we have r -BKA e f. Since every subterm of ef is ordered below e f by -CKA, we have that e f -CKA e f. It then follows that e f ≡CKA e f.

For the second requirement, suppose that X ∈ e fCKA is non-sequential. We then know that there exists a Y ∈ e fBKA such that X Y . This leaves us two cases to consider.


This means that ↓ r↓ -BKA e f. Since X<sup>0</sup> ∈ -↓BKA and X<sup>1</sup> ∈ r↓BKA by definition of closure, we can derive by Lemma 3.8 that

$$X = X\_0 \parallel X\_1 \in \left[ \ell \downarrow \parallel r \downarrow \right]\_{\mathsf{BKA}} \subseteq \left[ e \odot f \right]\_{\mathsf{BKA}} \tag{7}$$

#### **4.2 Closure**

The preclosure operator discussed above covers the non-sequential pomsets in the language e fCKA; it remains to find a term that covers the sequential pomsets contained in e fCKA.

To better give some intuition to the construction ahead, we first explore the observations that can be made when a sequential pomset W · X appears in the language e fCKA; without loss of generality, assume that W is nonsequential. In this setting, there must exist U ∈ eBKA and V ∈ fBKA such that W · X U -V . By Lemma 3.6, we find pomsets U0, U1, V0, V<sup>1</sup> such that

$$W \subseteq U\_0 \parallel V\_0 \qquad \qquad X \subseteq U\_1 \parallel V\_1 \qquad \qquad U\_0 \cdot U\_1 \subseteq U \qquad \qquad V\_0 \cdot V\_1 \subseteq V$$

This means that U<sup>0</sup> · U<sup>1</sup> ∈ eCKA and V<sup>0</sup> · V<sup>1</sup> ∈ fCKA. Now, suppose we could find e0, e1, f0, f<sup>1</sup> ∈ T such that

$$\begin{array}{ccccc} e\_0 \cdot e\_1 \leq\_{\mathsf{C\mathsf{A}}\mathsf{A}} e & & U\_0 \in \left[e\_0\right]\_{\mathsf{C\mathsf{A}}} & & U\_1 \in \left[e\_1\right]\_{\mathsf{C\mathsf{A}}}, \\\\ f\_0 \cdot f\_1 \leq\_{\mathsf{c\mathsf{A}}\mathsf{A}} f & & V\_0 \in \left[f\_0\right]\_{\mathsf{C\mathsf{A}}} & & V\_1 \in \left[f\_1\right]\_{\mathsf{C\mathsf{A}}}. \end{array}$$

Then we have W ∈ e<sup>0</sup> f0BKA, and X ∈ e<sup>1</sup> f1CKA. Thus, if we can find a closure of e<sup>1</sup> f1, then we have a term whose BKA-semantics contains W · X.

There are two obstacles that need to be resolved before we can use the observations above to find the closure of e f. The first problem is that we need to be sure that this process of splitting terms into sequential components is at all possible, i.e., that we can split e into e<sup>0</sup> and e<sup>1</sup> with e<sup>0</sup> ·e<sup>1</sup> -CKA <sup>e</sup> and <sup>U</sup>i <sup>∈</sup> <sup>e</sup>iCKA for i ∈ 2. We do this by designing a sequential analogue to the parallel splitting relation seen before. The second problem, which we will address later in this section, is whether this process of splitting a parallel term e f according to the exchange law and finding a closure of remaining term e<sup>1</sup> f<sup>1</sup> is well-founded, i.e., if we can find "enough" of these terms to cover all possible ways of sequentialising e f. This will turn out to be possible, by using the fixpoint axioms of KA as in Sect. 3.3 with linear systems.

We start by defining the sequential splitting relation.<sup>3</sup>

**Definition 4.5.** *Let* <sup>e</sup> ∈ T *;* <sup>∇</sup>e *is the smallest relation on* <sup>T</sup> *such that*

$$
\begin{array}{ccccccccc}
\hline
\overline{1\,\nabla\_{1}1} & \overline{a\,\nabla\_{a}1} & \overline{1\,\nabla\_{a}a} & \overline{1\,\nabla\_{a}a} & \overline{1\,\nabla\_{e\_{0}^{\*}}1} & \overline{\ell\,\nabla\_{e\_{0}+e\_{1}}r} & \overline{\ell\,\nabla\_{e\_{0}+e\_{1}}r} \\
\hline
\ell\,\nabla\_{e\_{0}\,{}^{c}}\,\boldsymbol{r} & & & & \ell\_{0}\,\nabla\_{e\_{0}\,{}^{c}}\,\boldsymbol{r} & \\
\hline
\ell\,\nabla\_{e\_{0}\,{}^{c}}\,\boldsymbol{r}\,\boldsymbol{r}\,\boldsymbol{e}\_{1} & & & \ell\_{0}\,\ell\,\nabla\_{e\_{0}\,{}^{c}}\,\boldsymbol{r} & \overline{\ell\,\nabla\_{e\_{0}}\,\boldsymbol{r}} & \overline{\ell\,\nabla\_{e\_{0}}\,\boldsymbol{r} \\
\hline
\end{array}
$$

Given <sup>e</sup> ∈ T , we refer to <sup>∇</sup>e as the *sequential splitting relation* of <sup>e</sup>, and to the elements of <sup>∇</sup>e as *sequential splices* of <sup>e</sup>. We need to establish a few properties of the sequential splitting relation that will be useful later on. The first of these properties is that, as for parallel splitting, <sup>∇</sup>e is finite.

### **Lemma 4.8.** *For* <sup>e</sup> ∈ T *,* <sup>∇</sup>e *is finite.*

We also have that the sequential composition of splices is provably below the term being split. Just like the analogous lemma for parallel splitting, this guarantees that our sequential splices never give rise to semantics not contained in the split term. This lemma also yields an observation about the width of sequential splices when compared to the term being split.

**Lemma 4.9.** *Let* <sup>e</sup> ∈ T *. If* , r ∈ T *with* <sup>∇</sup>e <sup>r</sup>*, then* · <sup>r</sup> -CKA e*.*

**Corollary 4.2.** *Let* <sup>e</sup> ∈ T *. If* , r ∈ T *with* <sup>∇</sup>e <sup>r</sup>*, then* <sup>|</sup>|, <sup>|</sup>r|≤|e|*.*

Lastly, we show that the splices cover every way of (sequentially) splitting up the semantics of the term being split, i.e., that <sup>∇</sup>e is dense when it comes to sequentially composed pomsets.

**Lemma 4.10.** *Let* e ∈ T *, and let* V *and* W *be pomsets such that* V ·W ∈ eCKA*. Then there exist* , r ∈ T *with* <sup>∇</sup>e <sup>r</sup> *such that* <sup>V</sup> <sup>∈</sup> -CKA *and* W ∈ rCKA*.*

*Proof.* The proof proceeds by induction on e. In the base, we can discount the case where e = 0, for then the claim holds vacuously. This leaves us two cases.

<sup>3</sup> The contents of this relation are very similar to the set of *left- and right-spines* of a NetKAT expression as used in [5].


For the inductive step, there are four cases to consider.


For the case where n > 0, we find by Lemma 3.4 an 0 ≤ m<n and seriesparallel pomsets X, Y such that <sup>X</sup> · <sup>Y</sup> <sup>U</sup>m, and <sup>V</sup> <sup>U</sup><sup>0</sup> · <sup>U</sup><sup>1</sup> ···Um−<sup>1</sup> · <sup>X</sup> and <sup>W</sup> <sup>Y</sup> · <sup>U</sup>m+1 · <sup>U</sup>m+2 ···Un. Since <sup>X</sup> · <sup>Y</sup> <sup>U</sup>m <sup>∈</sup> e0CKA and thus X · Y ∈ e0CKA, we find by induction , r ∈ T with <sup>∇</sup>e<sup>0</sup> <sup>r</sup> and <sup>X</sup> <sup>∈</sup> - CKA and Y ∈ r CKA. We can then choose = e- <sup>0</sup> · and r = r · e- <sup>0</sup> to find that V <sup>U</sup><sup>0</sup> ·U<sup>1</sup> ···Um−<sup>1</sup> ·<sup>X</sup> <sup>∈</sup> e- <sup>0</sup>CKA ·- CKA = -CKA and <sup>W</sup> <sup>Y</sup> ·Um+1 ·Um+2 ···U<sup>n</sup> <sup>∈</sup> r CKA ·e- <sup>0</sup>CKA = rCKA, and thus that V ∈ -CKA and W ∈ <sup>r</sup>CKA. Since <sup>∇</sup>e <sup>r</sup> holds, the claim follows.

*Example 4.5.* Let U be the pomset ca and let V be bc. Furthermore, let e be the term (a · b + c) - , and note that U · V ∈ <sup>e</sup>CKA. We then find that <sup>a</sup> <sup>∇</sup>a 1, and thus <sup>a</sup> <sup>∇</sup>a·b <sup>1</sup> · <sup>b</sup>. We can now choose = (<sup>a</sup> · <sup>b</sup> <sup>+</sup> <sup>c</sup>) - · a and r = (1 · b)·(a · b + c) - to find that U ∈ -CKA and V ∈ <sup>r</sup>CKA, while <sup>∇</sup>e <sup>r</sup>.

We know how to split a term sequentially. To resolve the second problem, we need to show that the process of splitting terms repeatedly ends somewhere. This is formalised in the notion of *right-hand remainders*, which are the terms that can appear as the right hand of a sequential splice of a term.

**Definition 4.6.** *Let* e ∈ T *. The set of* (right-hand) remainders *of* e*, written* R(e)*, is the smallest satisfying the rules*

$$\overline{e \in R(e)} \qquad \qquad \qquad \frac{f \in R(e) \qquad \ell \,\,\nabla\_f \,\, r}{r \in R(e)}$$

**Lemma 4.11.** *Let* e ∈ T *.* R(e) *is finite.*

With splitting and remainders we are in a position to define the linear system that will yield the closure of a parallel composition. Intuitively, we can think of this system as an automaton: every variable corresponds to a state, and every row of the matrix describes the "transitions" of the corresponding state, while every element of the vector describes the language "accepted" by that state without taking a single transition. Solving the system for a least fixpoint can be thought of as finding an expression that describes the language of the automaton.

**Definition 4.7.** *Let* e, f ∈ T *, and suppose that, for every* g ∈ T *such that* |g| < |e| + |f|*, there exists a closure* g↓*. We choose*

$$I\_{e,f} = \{ g \parallel h : g \in R(e), h \in R(f) \}$$

*The* <sup>I</sup>e,f *-vector* <sup>p</sup>e,f *and* <sup>I</sup>e,f *-matrix* <sup>M</sup>e,f *are chosen as follows.*

$$\begin{aligned} \, \_p p\_{e,f}(g \parallel h) \triangleq g \parallel f & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \, \_{\ell\_g \nabla\_g g'} \, \_{g} \, \_{h} \odot \, \_{h} \, \_{h} \end{aligned}$$

<sup>I</sup>e,f *is finite by Lemma 4.11. We write* <sup>L</sup>e,f *for the* <sup>I</sup>e,f *-linear system* Me,f , pe,f *.*

We can check that <sup>M</sup>e,f is well-defined. First, the sum is finite, because <sup>∇</sup>g and <sup>∇</sup>h are finite by Lemma 4.8. Second, if <sup>g</sup> <sup>h</sup> <sup>∈</sup> <sup>I</sup> and g, rg, h, rh ∈ T such that g <sup>∇</sup>g <sup>r</sup>g and h <sup>∇</sup>h <sup>r</sup>h, then <sup>|</sup>g|≤|g|≤|e<sup>|</sup> and <sup>|</sup>h|≤|h|≤|f<sup>|</sup> by Corollary 4.2, and thus, if <sup>d</sup> ∈ T such that <sup>|</sup>d<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>g<sup>|</sup> <sup>+</sup> <sup>|</sup>h|, then <sup>|</sup>d<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>e<sup>|</sup> <sup>+</sup> <sup>|</sup>f|, and therefore a closure of <sup>d</sup> exists, meaning that g h exists, too.

The least solution to <sup>L</sup>e,f obtained through Lemma 3.12 is the <sup>I</sup>-vector denoted by <sup>s</sup>e,f . We write <sup>e</sup> <sup>⊗</sup> <sup>f</sup> for <sup>s</sup>e,f (<sup>e</sup> f), i.e., the least solution at e f.

Using the previous lemmas, we can then show that e ⊗ f is indeed a closure of e f, provided that we have closures for all terms of strictly lower width. The intuition of this proof is that we use the uniqueness of least fixpoints to show that e f ≡CKA e ⊗ f, and then use the properties of preclosure and the normal form of series-parallel pomsets to show that e fCKA = e ⊗ fBKA.

**Lemma 4.12.** *Let* e, f ∈ T *, and suppose that, for every* g ∈ T *with* |g| < |e| + |f|*, there exists a closure* g↓*. Then* e ⊗ f *is a closure of* e f*.*

*Proof.* We begin by showing that e <sup>f</sup> <sup>≡</sup>CKA <sup>e</sup> <sup>⊗</sup> <sup>f</sup>. We can see that <sup>p</sup>e,f is a solution to <sup>L</sup>e,f , by calculating for <sup>g</sup> <sup>h</sup> <sup>∈</sup> <sup>I</sup>e,f :

$$\begin{array}{ll} (p\_{e,f} + M\_{e,f} \cdot p\_{e,f})(g \parallel h) \\ = g \parallel h + \sum\_{r\_g \parallel r\_h \in I} \left( \sum\_{\substack{\ell\_g \nabla\_g r\_g \\ \ell\_h \nabla\_h r\_h \end{array}} \ell\_g \odot \ell\_h \right) \cdot (r\_g \parallel r\_h) \\ \tag{\text{def. } M\_{e,f}, p\_{e,f})} \\ = \\ \end{array}$$

$$\equiv\_{\mathsf{C\mathsf{A}}} g \parallel h + \sum\_{\substack{\ell\_{\mathsf{g}} \parallel r\_{h} \in I \\ \ell\_{h} \in \nabla\_{g} r\_{\mathcal{g}}}} \sum\_{\substack{\ell\_{\mathcal{g}} \in \nabla\_{g} r\_{\mathcal{g}} \\ \ell\_{h} \in \nabla\_{h} r\_{h}}}^{\ell\_{h} \nabla\_{h} r\_{\mathcal{g}}} (\ell\_{g} \odot \ell\_{h}) \cdot (r\_{g} \parallel r\_{h}) \tag{distributivity}$$

$$\equiv\_{\mathsf{CAK}} g \parallel h + \sum\_{\substack{r\_h \parallel r\_h \in I \\ r\_h \parallel r\_h}} \sum\_{\substack{\ell\_g \in \nabla\_g r\_g \\ \ell\_h \nabla\_h r\_h}}^{\ell\_h \text{-} \{\ell\_g\}} (\ell\_g \parallel \ell\_h) \cdot (r\_g \parallel r\_h) \tag{1.2.12}$$

$$\leq\_{\mathsf{co\\_o}} g \parallel h + \sum\_{\substack{r\_g \parallel r\_h \in I \ l\_g}} \sum\_{\substack{\ell\_g \nabla\_g r\_g \\ \ell\_h \nabla\_h r\_h}}^{\ell\_h \nabla\_h r\_h} (\ell\_g \cdot r\_g) \parallel (\ell\_h \cdot r\_h) \tag{exchange}$$

$$\leq\_{\mathsf{C@A}} g \parallel h + \sum\_{\substack{r\_g \parallel r\_h \in I \\ \ell\_h \nabla\_h r\_h}} \sum\_{g \parallel h}^{\ell\_h \nabla\_h r\_h} g \parallel h \tag{\text{Lemma 4.9}}$$

$$\begin{array}{ll} \equiv\_{\mathsf{Coka}} g \parallel h & & \text{(idempotence)}\\ = p\_{e,f}(g \parallel h) & & \text{(def. } p\_{e,f}) \end{array}$$

To see that <sup>p</sup>e,f is the *least* solution to <sup>L</sup>e,f , let <sup>q</sup>e,f be a solution to <sup>L</sup>e,f . We then know that <sup>M</sup>e,f · <sup>q</sup>e,f <sup>+</sup> <sup>p</sup>e,f -CKA <sup>q</sup>e,f ; thus, in particular, <sup>p</sup>e,f -CKA <sup>q</sup>e,f . Since the least solution to a linear system is unique up to ≡CKA, we find that <sup>s</sup>e,f <sup>≡</sup>CKA <sup>p</sup>e,f , and therefore that <sup>e</sup> <sup>⊗</sup> <sup>f</sup> <sup>=</sup> <sup>s</sup>e,f (<sup>e</sup> <sup>f</sup>) <sup>≡</sup>CKA <sup>p</sup>e,f (<sup>e</sup> f) = e f.

It remains to show that if U ∈ e fCKA, then U ∈ e ⊗ fBKA. To show this, we show the more general claim that if g h ∈ I and U ∈ g hCKA, then U ∈ <sup>s</sup>e,f (<sup>g</sup> <sup>h</sup>)BKA. Write <sup>U</sup> <sup>=</sup> <sup>U</sup><sup>0</sup> · <sup>U</sup><sup>1</sup> ···Un−<sup>1</sup> such that for 0 <sup>≤</sup> i<n, <sup>U</sup>i is non-sequential (as in Corollary 3.1). The proof proceeds by induction on n. In the base, we have that n = 0. In this case, U = 1, and thus U ∈ g hBKA by Lemma 3.2. Since g <sup>h</sup> <sup>=</sup> <sup>p</sup>e,f (<sup>g</sup> h) -BKA <sup>s</sup>e,f (<sup>g</sup> h), it follows that U ∈ <sup>s</sup>e,f (<sup>g</sup> h)BKA by Lemma 3.8.

For the inductive step, assume the claim holds for n−1. We write U = U<sup>0</sup> ·U , with <sup>U</sup> <sup>=</sup> <sup>U</sup><sup>1</sup> · <sup>U</sup><sup>2</sup> ···Un−<sup>1</sup>. Since <sup>U</sup><sup>0</sup> · <sup>U</sup> <sup>∈</sup> g hCKA, there exist W ∈ gCKA and X ∈ hCKA such that U<sup>0</sup> · U W - X. By Lemma 3.6, we find pomsets W0, W1, X0, X<sup>1</sup> such that W0·W<sup>1</sup> W and X0·X<sup>1</sup> X, as well as U<sup>0</sup> W<sup>0</sup> - X<sup>0</sup> and U W<sup>1</sup> - <sup>X</sup>1. By Lemma 4.10, we find g, rg, h, rh ∈ T with g <sup>∇</sup>g <sup>r</sup>g and h <sup>∇</sup>h <sup>r</sup>h, such that <sup>W</sup><sup>0</sup> <sup>∈</sup> gCKA, <sup>W</sup><sup>1</sup> <sup>∈</sup> <sup>r</sup>gCKA, <sup>X</sup><sup>0</sup> <sup>∈</sup> hCKA and <sup>X</sup><sup>1</sup> <sup>∈</sup> <sup>r</sup>hCKA.

From this, we know that U<sup>0</sup> ∈ g hCKA and <sup>U</sup> <sup>∈</sup> <sup>r</sup>g <sup>r</sup>hCKA. Since <sup>U</sup><sup>0</sup> is non-sequential, we have that U<sup>0</sup> ∈ g hBKA. Moreover, by induction we find that U ∈ <sup>s</sup>e,f (rg <sup>r</sup>h)BKA. Since g h -BKA <sup>M</sup>e,f (<sup>g</sup> h, rg <sup>r</sup>h) by definition of <sup>M</sup>e,f , we furthermore find that

$$(\ell\_g \odot \ell\_h) \cdot s\_{e,f}(r\_g \parallel r\_h) \leq\_{\mathsf{BAK}} M\_{e,f}(g \parallel h, r\_g \parallel r\_h) \cdot s\_{e,f}(r\_g \parallel r\_h)$$

Since <sup>r</sup>g <sup>r</sup>h <sup>∈</sup> <sup>I</sup>, we find by definition of the solution to a linear system that

$$M\_{e,f}(g \parallel h, r\_g \parallel r\_h) \cdot s\_{e,f}(r\_g \parallel r\_h) \stackrel{\leq}{\equiv} s\_{\texttt{e\\_f}}(g \parallel h)$$

By Lemma 3.8 and the above, we conclude that U = U<sup>0</sup> · U ∈ <sup>s</sup>e,f (<sup>g</sup> h)BKA.

For a concrete example where we find a closure of a (non-trivial) parallel composition by solving a linear system, we refer to Appendix A.

With closure of parallel composition, we can construct a closure for any term and therefore conclude completeness of CKA.

**Theorem 4.1.** *Let* e ∈ T *. We can construct a closure* e↓ *of* e*.*

*Proof.* The proof proceeds by induction on |e| and the structure of e, i.e., by considering f before g if |f| < |g|, or if f is a strict subterm of g (in which case |f|≤|g| also holds). It is not hard to see that this induces a well-ordering on T .

Let e be a term of width n, and suppose that the claim holds for all terms of width at most n − 1, and for all strict subterms of e. There are three cases.


**Corollary 4.3.** *Let* e, f ∈ T *. If* eCKA = fCKA*, then* e ≡CKA f*.*

*Proof.* Follows from Theorem 4.1 and Lemma 4.1.

### **5 Discussion and Further Work**

By building a syntactic closure for each series-rational expression, we have shown that the standard axiomatisation of CKA is complete with respect to the CKAsemantics of series-rational terms. Consequently, the algebra of closed seriesrational pomset languages forms the free CKA.

Our result leads to several decision procedures for the equational theory of CKA. For instance, one can compute the closure of a term as described in the present paper, and use an existing decision procedure for BKA [3,12,20]. Note however that although this approach seems suited for theoretical developments (such as formalising the results in a proof assistant), its complexity makes it less appealing for practical use. More practically, one could leverage recent work by Brunet et al. [3], which provides an algorithm to compare closed series-rational pomset languages. Since this is the free concurrent Kleene algebra, this algorithm can now be used to decide the equational theory of CKA. We also obtain from the latter paper that this decision problem is expspace-complete.

We furthermore note that the algorithm to compute downward closure can be used to extend half of the result from [14] to a Kleene theorem that relates the CKA-semantics of expressions to the pomset automata proposed there: if e ∈ T , we can construct a pomset automaton <sup>A</sup> with a state <sup>q</sup> such that <sup>L</sup>A(q) = eCKA.

Having established pomset automata as an operational model of CKA, a further question is whether these automata are amenable to a bisimulation-based equivalence algorithm, as is the case for finite automata [10]. If this is the case, optimisations such as those in [2] might have analogues for pomset automata that can be found using the coalgebraic method [23].

While this work was in development, an unpublished draft by Laurence and Struth [19] appeared, with a first proof of completeness for CKA. The general outline of their proof is similar to our own, in that they prove that closure of pomset languages preserves series-rationality, and hence there exists a syntactic closure for every series-rational expression. However, the techniques used to establish this fact are quite different from the developments in the present paper. First, we build the closure via syntactic methods: explicit splitting relations and solutions of linear systems. Instead, their proof uses automata theoretic constructions and algebraic closure properties of regular languages; in particular, they rely on congruences of finite index and language homomorphisms. We believe that our approach leads to a substantially simpler and more transparent proof. Furthermore, even though Laurence and Struth do not seem to use any fundamentally non-constructive argument, their proof does not obviously yield an algorithm to effectively compute the closure of a given term. In contrast, our proof is explicit enough to be implemented directly; we wrote a simple Python script (under six hundred lines) to do just that [16].

A crucial ingredient in this work was the computation of least solutions of linear systems. This kind of construction has been used on several occasions for the study of Kleene algebras [1,4,18], and we provide here yet another variation of such a result. We feel that linear systems may not have yet been used to their full potential in this context, and could still lead to interesting developments.

A natural extension of the work conducted here would be to turn our attention to the signature of concurrent Kleene algebra that includes a "parallel star" operator e. The completeness result of Laurence and Struth [20] holds for BKA with the parallel star, so in principle one could hope to extend our syntactic closure construction to include this operator. Unfortunately, using the results of Laurence and Struth, we can show that this is not possible. They defined a notion of *depth* of a series-parallel pomset, intuitively corresponding to the nesting of parallel and sequential components. An important step in their development consists of proving that for every series-parallel-rational language there exists a finite upper bound on the depth of its elements. However, the language a CKA does not enjoy this property: it contains every series-parallel pomset exclusively labelled with the symbol a. Since we can build such pomsets with arbitrary depth, it follows that there does not exist a syntactic closure of the term a. New methods would thus be required to tackle the parallel star operator.

Another aspect of CKA that is not yet developed to the extent of KA is the coalgebraic perspective. We intend to investigate whether the coalgebraic tools developed for KA can be extended to CKA, which will hopefully lead to efficient bisimulation-based decision procedures [2,5].

**Acknowledgements.** We thank the anonymous reviewers for their insightful comments. This work was partially supported by the ERC Starting Grant ProFoundNet (grant code 679127).

### **A Worked Example: A Non-trivial Closure**

In this appendix, we solve an instance of a linear system as defined in Definition 4.7 for a given parallel composition. For the sake of brevity, the steps are somewhat coarse-grained; the reader is encouraged to reproduce the steps by hand.

Consider the expression e f = a<sup>∗</sup> <sup>b</sup>. The linear system <sup>L</sup>e,f that we obtain from this expression consists of six inequations; in matrix form (with zeroes omitted), this system is summarised as follows:<sup>4</sup>

$$\begin{array}{c||c c c c c c} & 1 & 1 & 1 & & & & & & 1 \\ & 1 & \mid & b & & & & & & & & \\ a \cdot a^\* \mid & 1 & & & & & & & & & \\ a^\* \mid 1 & & & & & & & & & & \\ a \cdot a^\* \mid & b & & & & & & & & \\ a \cdot a^\* \mid & b & & & & & & & & \\ a^\* \mid & b & & & & & & & & \\ \end{array} \begin{array}{c|c|c} & 1 & & & & & & & & \\ a & & & & & & & & & \\ a \cdot a^\* \mid & b & & & & & \\ a^\* \mid & a^\* \mid & b & & & & \\ a \cdot a^\* \mid & b & & & & & \\ b & 1 & a^\* \mid & b & & & \\ \end{array} \begin{array}{c|c|c} & 1 & & & & & & \\ a \cdot a^\* \mid & b & & & & \\ a \cdot a^\* \mid & b & & & & \\ a \cdot a^\* \mid & b & & & & \\ a \cdot a^\* \mid & b & & & & \\ \end{array} \end{array}$$

Let us proceed under the assumption that x is a solution to the system; the constraint imposed on x by the first two rows is given by the inequations

$$x(1 \parallel 1) + 1 \stackrel{<}{=}\_{c \bowtie \mathsf{A}} x(1 \parallel 1) \tag{1}$$

$$a \cdot x(1 \parallel 1) + x(1 \parallel b) + b \leq\_{c \bowtie \kappa} x(1 \parallel b) \tag{2}$$

Because these inequations do not involve the other positions of the system, we can solve them in isolation, and use their solutions to find solutions for the remaining positions; it turns out that choosing x(1 - 1) = 1 and x(1 b) = b suffices here.

We carry on to fill these values into the inequations given by the third and fourth row of the linear system. After some simplification, these work out to be

$$a \cdot a^\star + a \cdot a^\star \cdot x(a^\star \parallel 1) + a^\star \cdot x(a \cdot a^\star \parallel 1) \stackrel{\le}{=}\_{\mathsf{csca}} x(a \cdot a^\star \parallel 1) \tag{3}$$

$$a^\star + a^\star \cdot a \cdot x(a^\star \parallel 1) + a^\star \cdot x(a \cdot a^\star \parallel 1) \le\_{c\star\star} x(a^\star \parallel 1)\tag{4}$$

Applying the least fixpoint axiom to (3) and simplifying, we obtain

$$a \cdot a^\star + a \cdot a^\star \cdot x(a^\star \parallel 1) \le\_{\text{c\'\'\'\text{c\'\'}}} x(a \cdot a^\star \parallel 1) \tag{5}$$

Substituting this into (4) and simplifying, we find that

$$a^\star + a \cdot a^\star \cdot x(a^\star \parallel 1) \le\_{\text{c\'\'a\'}} x(a^\star \parallel 1) \tag{6}$$

This inequation, in turn, gives us that a- -CKA x(a- - 1) by the least fixpoint axiom. Plugging this back into (3) and simplifying, we find that

$$a \cdot a^\star + a^\star \cdot x(a \cdot a^\star \parallel 1) \stackrel{\le}{=}\_{\mathsf{c}\ltimes} x(a \cdot a^\star \parallel 1)\tag{7}$$

<sup>4</sup> Actually, the system obtained from *a*- *b* as a result of Definition 4.7 is slightly larger; it also contains rows and columns labelled by 1 · *<sup>a</sup>*- - 1 and 1 · *<sup>a</sup>*- *b*; these turn out to be redundant. We omit these rows from the example for simplicity.

Again by the least fixpoint axiom, this tells us that a · a- -CKA x(a · a- - 1). One easily checks that x(a · a- - 1) = a · a and x(a- - 1) = a are solutions to (3) and (4); by the observations above, they are also the least solutions.

It remains to find the least solutions for the final two positions. Filling in the values that we already have, we find the following for the fifth row:

$$\begin{aligned} &a \parallel b + a \cdot b + (a^\star \parallel b) \cdot a \cdot a^\star + (a \cdot a^\star \parallel b) \cdot a^\star \\ &+ a^\star \cdot x (a \cdot a^\star \parallel b) + a \cdot a^\star \cdot x (a^\star \parallel b) + a \cdot a^\star \parallel b \stackrel{\triangle}{=}\_{\mathsf{csca}} x (a \cdot a^\star \parallel b) \end{aligned} \tag{8}$$

Applying the exchange law<sup>5</sup> to the first three terms, we find that they are contained in (a · a- b) · a-, as is the last term; (8) thus simplifies to

$$x(a \cdot a^\star \parallel b) \cdot a^\star + a^\star \cdot x(a \cdot a^\star \parallel b) + a \cdot a^\star \cdot x(a^\star \parallel b) \stackrel{\leftarrow}{=} \text{c\upharpoonright x}(a \cdot a^\star \parallel b) \tag{9}$$

By the least fixpoint axiom, we find that

$$a^\star \cdot (a \cdot a^\star \parallel b) \cdot a^\star + a \cdot a^\star \cdot x(a^\star \parallel b) \leq\_{\text{c\'\'a\'}} x(a \cdot a^\star \parallel b) \tag{10}$$

For the sixth row, we find that after filling in the solved positions, we have

$$\begin{aligned} \label{eq:SDAC-1} b &+ b + \begin{pmatrix} a^\star \parallel b \end{pmatrix} \cdot a \cdot a^\star + \begin{pmatrix} a \cdot a^\star \parallel b \end{pmatrix} \cdot a^\star \\ &+ a^\star \cdot x(a \cdot a^\star \parallel b) + a \cdot a^\star \cdot x(a^\star \parallel b) + a^\star \parallel b \stackrel{\leftharpoons}{\underset{c \in \mathbb{K}}{\rightleftharpoons}} x(a^\star \parallel b) \end{aligned} \tag{11}$$

Simplifying and applying the exchange law as before, it follows that

$$x(a^\star \parallel b) \cdot a^\star + a^\star \cdot x(a \cdot a^\star \parallel b) + a \cdot a^\star \cdot x(a^\star \parallel b) \stackrel{\le}{=} \text{c\upharpoonright} x(a^\star \parallel b) \tag{12}$$

We then subsitute (10) into (12) to find that

$$x(a^\star \parallel b) \cdot a^\star + a \cdot a^\star \cdot x(a^\star \parallel b) \leq\_{\text{C\'oA}} x(a^\star \parallel b) \tag{13}$$

which, by the least fixpoint axiom, tells us that a- · (a- b) · a- -CKA x(a- b). Plugging the latter back into (9), we find that

$$a^\star \cdot (a \cdot a^\star \parallel b) \cdot a^\star + a \cdot a^\star \cdot a^\star \cdot (a^\star \parallel b) \cdot a^\star \leq\_{\text{c\''\kappa}} x (a \cdot a^\star \parallel b) \tag{14}$$

which can, using the exchange law, be reworked into

$$a^\star \cdot (a \cdot a^\star \parallel b) \cdot a^\star \stackrel{<}{=}\_{c\kappa\kappa} x(a \cdot a^\star \parallel b) \tag{15}$$

Now, if we choose x(a · a- b) = a- ·(a · a- b)· a and x(a- b) = a- ·(a- - b) · a-, we find that these choices satisfy (9) and (12)—making them part of a solution; by construction, they are also the least solutions.

In summary, x is a solution to the linear system, and by construction it is also the least solution. The reader is encouraged to verify that our choice of x(a- b) is indeed a closure of a- b.

<sup>5</sup> A caveat here is that applying the exchange law indiscriminately may lead to a term that is not a closure (specifically, it may violate the semantic requirement in Definition 4.1). The algorithm used to solve arbitrary linear systems in Lemma 3.12 does not make use of the exchange law to simplify terms, and thus avoids this pitfall.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Concurrency and Distribution

## **Correctness of a Concurrent Object Collector for Actor Languages**

Juliana Franco1(B) , Sylvan Clebsch<sup>2</sup>, Sophia Drossopoulou<sup>1</sup>, Jan Vitek3,4, and Tobias Wrigstad<sup>5</sup>

> Imperial College London, London, UK j.vicente-franco@imperial.ac.uk Microsoft Research Cambridge, Cambridge, UK Northeastern University, Boston, USA CVUT, Prague, Czech Republic Uppsala University, Uppsala, Sweden

**Abstract.** ORCA is a garbage collection protocol for actor-based programs. Multiple actors may mutate the heap while the collector is running without any dedicated synchronisation. ORCA is applicable to any actor language whose type system prevents data races and which supports causal message delivery. We present a model of ORCA which is parametric to the host language and its type system. We describe the interplay between the host language and the collector. We give invariants preserved by ORCA, and prove its soundness and completeness.

### **1 Introduction**

Actor-based systems are massively parallel programs in which individual actors communicate by exchanging messages. In such systems it is essential to be able to manage data automatically with as little synchronisation as possible. In previous work [9,12], we introduced the ORCA protocol for garbage collection in actor-based systems. ORCA is language-agnostic, and it allows for concurrent collection of objects in actor-based programs with no additional locking or synchronisation, no copying on message passing and no stop-the-world steps. ORCA can be implemented in any actor-based system or language that has a type system which prevents data races and that supports causal message delivery. There are currently two instantiations of ORCA, one is for Pony [8,11] and the other for Encore [5]. We hypothesise that ORCA could be applied to other actor-based systems that use static types to enforce isolation [7,21,28,36]. For libraries, such as Akka, which provide actor-like facilities, pluggable type systems could be used to enforce isolation [20].

This paper develops a formal model of ORCA. More specifically, the paper contributions are:

1. Identification of the requirements that the host language must statically guarantee;

c The Author(s) 2018 A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 885–911, 2018. https://doi.org/10.1007/978-3-319-89884-1\_31


A formal model facilitates the understanding of how ORCA can be applied to different languages. It also allows us to explore extensions such as shared mutable state across actors [40], reduction of tracing of immutable references [12], or incorporation of borrowing [4]. Alternative implementations of ORCA that rely on deep copying (*e.g.*, to reduce type system complexity) across actors on different machines can also be explored through our formalism.

Developing a formal model of ORCA presents challenges:


The full proofs and omitted definitions are available in appendix [16].

### **2 Host Language Requirements**

ORCA makes some assumptions about its host language, we describe them here.

#### **2.1 Actors and Objects**

Actors are active entities with a thread of control, while objects are data structures. Both actors and objects may have fields and methods. Method calls on objects are synchronous, whereas method calls on actors amount to asynchronous message sends—they all called *behaviours*. Messages are stored in a FIFO queue. When idle, an actor processes the top message from its queue. At any given point of time an actor may be either idle, executing a behaviour, or collecting garbage.

**Fig. 1.** Actors and objects. Full arrows are references, grey arrows are overwritten references: references that no longer exist.


**Fig. 2.** Capabilities. Heap mutation may modify what object is reachable through a path, but not the path's capability.

Figure 1 shows actors α<sup>1</sup> and α2, objects ω<sup>1</sup> to ω4. In [16] we show how to create this object graph in Pony. In Fig. 1(a), actor α<sup>1</sup> points to object ω<sup>1</sup> through field f<sup>1</sup> to ω<sup>2</sup> through field f3, and object ω<sup>1</sup> points to ω<sup>3</sup> through field f5. In Fig. 1(b), actor α<sup>1</sup> creates ω<sup>4</sup> and assigns it to this.f1.f5. In Fig. 1(c), α<sup>1</sup> has given up its reference to ω<sup>1</sup> and sent it to act<sup>2</sup> which stored it in field f6. Note that the process of sending sent not only ω<sup>1</sup> but also implicitily ω4.

#### **2.2 Mutation, Transfer and Accessibility**

Message passing is the only way to share objects. This falls out of the capability system. If an actor shares an object with another actor, then either it gives up the object or neither actor has a write capability to that object. For example, after α<sup>1</sup> sends ω<sup>1</sup> to α2, it cannot mutate ω1. As a consequence, heap mutation only decreases accessibility, while message sends can transfer accessibility from sender to receiver. When sending immutable data the sender does not need to transfer accessibility. However, when it sends a mutable object it cannot keep the ability to read or to write the object. Thus, upon message send of a mutable object, the actor must consume, or destroy, its reference to that object.

#### **2.3 Capabilities and Accessibility**

ORCA assumes that a host language's type system assigns *access rights* to paths. A path is a sequence of field names. We call these access rights *capabilities*.

We expect the following three capabilities; read, write, tag. The first two allow reading and writing an object's fields respectively. The tag capability only allows identity comparison and sending the object in a message. The type system must ensure that actors have no read-write races. This is natural for actor languages [5, 7,11,21].

Figure 2 shows capabilities assigned to the paths in Fig. 1: α1.f1.f<sup>5</sup> has capability write, thus α<sup>1</sup> can read and write to the object reachable from that path. Note that capapabilities assigned to paths are immutable, while the contents of those paths may change. For example, in Fig. 1(a), α<sup>1</sup> can write to ω<sup>3</sup> through path f1.f5, while in Fig. 1(b) it can write to ω<sup>4</sup> through the same path. In Fig. 1(a) and (b), α<sup>2</sup> can use the address of ω<sup>1</sup> but cannot read or write it, due to the tag capability, and therefore cannot access ω<sup>3</sup> (in Fig. 1(a)) nor ω<sup>4</sup> (in Fig. 1(b)). However, in Fig. 1(c) the situation reverses: α2, which received ω<sup>1</sup> with write capability is now able to reach it through field f6, and therefore ω4. Notice that the existence of a path from an actor to an object does not imply that the object is accessible to the actor: In Fig. 1(a), there is a path from α<sup>2</sup> to ω3, but α<sup>2</sup> cannot access ω3. Capabilities protect against data races by ensuring that if an object can be mutated by an actor, then no other actor can access its fields.

### **2.4 Causality**

ORCA uses messages to deliver protocol-related information, it thus requires causal delivery. Messages must be delivered after any and all messages that caused them. Causality is the smallest transitive relation, such that if a message m is sent by some actor after it received or sent m, then m is a cause of m . Causal delivery entails that m be delivered after m.

For example, if actor α<sup>1</sup> sends m<sup>1</sup> to actor α2, then sends m<sup>2</sup> to actor α3, and α<sup>3</sup> receives m<sup>2</sup> and sends m<sup>3</sup> to α2, then m<sup>1</sup> is a cause of m2, and m<sup>2</sup> is a cause of m3. Causal delivery requires that α<sup>2</sup> receive m<sup>1</sup> before receiving m3. No requirements are made on the order of delivery to different actors.

### **3 Overview of** ORCA

We introduce ORCA and discuss how to localise the necessary information to guarantee safe deallocation of objects in the presence of sharing. Every actor has a local heap in which it allocates objects. An actor *owns* the objects it has allocated, and ownership is fixed for an object's life-time, but actors are free to reference objects that they do not own. Actors are obligated to collect their own objects once these are no longer needed. While collecting, an actor must be able to determine whether an object can be deallocated using only local information. This allows all other actors to make progress at any point.

### **3.1 Mutation and Collection**

ORCA relies on capabilities for actors to reference objects owned by other actors and to support concurrent mutation to parts of the heap that are not being concurrently collected. Capabilities avoid the need for barriers.

**I<sup>1</sup>** An object accessible with write capability from an actor is not accessible with read or write capability from any other actor.

This invariant ensures an actor, while executing garbage collection, can safely trace any object to which it has read or write access without the need to protect against concurrent mutation from other actors.

### **3.2 Local Collection**

An actor can collect its objects based on local information without consulting other actors. For this to be safe, the actor must know that an owned, locally inaccessible, object is also globally inaccessible (*i.e.*, inaccessible from any other actors or messages)<sup>1</sup>. Shared objects are reference counted by their owner to ensure:

**I<sup>2</sup>** An object accessible from a message queue or from a non-owning actor has reference count larger than zero in the owning actor.

Thus, a locally inaccessible object with a reference count of 0 can be collected.

### **3.3 Messages and Collection**

**I<sup>1</sup>** and **I<sup>2</sup>** are sufficient to ensure that local collection is safe. Maintaining **I<sup>2</sup>** is not trivial as accessibility is affected by message sends. Moreover, it is possible for an actor to share a read object with another actor through a message. What if that actor drops its reference to the object? The object's owner should be informed so it can decrease its reference count. What happens when an actor receives an object in a message? The object's owner should be infomed, so that it can increase its reference count. To reduce message traffic, ORCA uses *distributed, weighted, deferred* reference counts. Each actor maintains reference counts that tracks the sharing of its objects. It also maintains counts for "foreign objects", tracking references to objects owned by other actors. This reference count for non-owning actors is what allows sending/receiving objects without having to inform their owner while maintaining **I2**. For any object or actor ι, we denote with LRC(ι) the reference count for ι in ι's owner, and with FRC(ι) we denote the sum of the reference counts for ι in all other actors. The counts do not reflect the number of references, rather the existence of references:

**I<sup>3</sup>** If a non-owning actor can access an object through a path from its fields or call stack, its reference count for this object is greater than 0.

An object is globally accessible if it is accessible from any actor or from a message in some queue. Messages include reference increment or decrement messages these are ORCA-level messages and they are not visible to applications. We introduce two logical counters: AMC(ι) to account for the number of application

<sup>1</sup> For example, in Fig. 1(c) ω<sup>4</sup> in is locally inaccessible, but globally accessible.

**Fig. 3.** Black arrows are references, numbered in creation order. Blue solid arrows are application messages and blue dashed arrows ORCA-level message. (Color figure online)

messages with paths to ι, and OMC(ι) to account for ORCA-level messages with reference count increment and decrement requests. These counters are not present at run-time, but they will be handy for reasoning about ORCA. The owner's view of an object is described by the LRC and the OMC, while the foreign view is described by the FRC and the AMC. These two views must agree:

$$\mathbf{I\_4} \qquad \forall \ \iota \ . \quad \text{LRC}(\iota) + \text{OMC}(\iota) = \text{AMC}(\iota) + \text{FRC}(\iota)$$

**I2**, **I<sup>3</sup>** and **I<sup>4</sup>** imply that a locally inaccessible object with LRC = 0 can be reclaimed.

### **3.4 Example**

Consider actors Andy, Bart and Catalin, and steps from Fig. 3.

*Initial State.* Let ω be a newly allocated object. As it is only accessible to its owning actor, Andy, there is no entry for it in any RC.

*Sharing* ω*.* When Andy shares ω with Bart, ω is placed on Bart's message queue, meaning that AMC(ω) = 1. This is reflected by setting RCAndy(ω) to 1. This preserves **I<sup>4</sup>** and the other invariants. When Bart takes the message with ω from his queue, AMC(ω) becomes zero, and Bart sets his foreign reference count for ω to 1, that is, RCBart(ω) = 1. When Bart shares ω with Catalin, we get AMC(ω) = 1. To preserve **I4**, Bart could set RCBart(ω) to 0, but this would break **I3**. Instead, Bart sends an ORCA-level message to Andy, asking him to increment his (local) reference count by some n, and sets his own RCBart(ω) to n. <sup>2</sup> This preserves **I<sup>4</sup>** and the other invariants. When Catalin receives the message later on, she will behave similarly to Bart in step 2, and set RCCatalin(ω)= 1.

The general rule is that when an actor sends one of its objects, it increments the corresponding (local) RC by 1 (reflecting the increasing number of foreign references) but when it sends a non-owned object, it decrements the corresponding (foreign) RC (reflecting a transfer of some of its stake in the object). Special care needs to be taken when the sender's RC is 1.

<sup>2</sup> This step can be understood as if Bart "borrowed" <sup>n</sup> units from Andy, added <sup>n</sup> <sup>−</sup> <sup>1</sup> to his own RC, and gave 1 to the AMC, to reach Catalin eventually.

Further note that if Andy, the owner of ω, received ω, he would decrease his counter for ω rather than increase it, as his reference count denotes foreign references to ω. When an actor receives one of its owned objects, it *decrements* the corresponding (local) RC by 1 but when it receives a non-owned object, it *increments* the corresponding (foreign) RC by 1.

*Dropping References to* ω*.* Subsequent to sharing ω with Catalin, Bart performs GC, and traces his heap without reaching ω (maybe because it did not store ω in a field). This means that Bart has given up his stake in ω. This is reflected by sending a message to Andy to decrease his RC for ω by n, and setting Bart's RC for ω to 0. Andy's local count of the foreign references to ω are decreased piecemeal like this, until LRC(ω) reaches zero. At this point, tracing Andy's local heap can determine if ω should be collected.

*Further Aspects.* We briefly outline further aspects which play a role in ORCA.


Finally, we reflect on the nature of reference counts: they are *distributed*, in the sense that an object's owner and every actor referencing it keep separate counts; *weighted*, in that they do not reflect the number of aliases; and *deferred*, in that they are not manipulated immediately on alias creation or destruction, and that non-local increments/decrements are handled asynchronously.

### **4 The ORCA Protocol**

We assume enumerable, disjoint sets *ActorAddr* and *ObjAddr*, for addresses of actors and objects. The union of the two is the set of addresses including null. We require a mapping C*lass* that gives the name of the class of each actor in a given configuration, and a mapping O that returns the owner of an address

> *Addr* = *ActorAddr ObjAddr* {null} C*lass* : *Config* × *ActorAddr* → *ClassId* O : *Addr* → *ActorAddr*

such that the owner of an actor is the actor itself, *i.e.*, <sup>∀</sup>α∈*ActorAddr*. <sup>O</sup>(α) = <sup>α</sup>.

Definition <sup>1</sup> describes run-time configurations, <sup>C</sup>. They consist of a heap, <sup>χ</sup>, which maps addresses and field identifiers to addresses,<sup>3</sup> and an actor map, *as*, from actor addresses to actors. Actors consist of a frame, a queue, a reference count table, a state, a working set, marks, and a program counter. Frames are either empty, or consist of the identifier for the currently executing behaviour, and a mapping from variables to addresses. Queues are sequences of messages. A message is either an *application message* of the form app(φ) denoting a highlevel language message with the frame φ, or an ORCA message, of the form orca(ι: z), denoting an in-flight request for a reference count change for ι by z. The state distinguishes whether the actor is idle, or executing some behaviour, or performing garbage collection. We discuss states, working sets, marks, and program counters in Sect. 4.3. We use naming conventions: <sup>α</sup> <sup>∈</sup> *ActorAddr*; <sup>ω</sup> <sup>∈</sup> *ObjAddr*; <sup>ι</sup> <sup>∈</sup> *Addr*; <sup>z</sup> <sup>∈</sup> <sup>Z</sup>; <sup>n</sup> <sup>∈</sup> <sup>N</sup>; <sup>b</sup> <sup>∈</sup> *BId*; <sup>x</sup> <sup>∈</sup> *VarId*; <sup>A</sup> <sup>∈</sup> *ClassId*; and ιs for a sequence of addresses <sup>ι</sup>1...ιn. We write <sup>C</sup>.heap for <sup>C</sup>'s heap; and α.quC, or α.rcC, or α.frameC, or α.st<sup>C</sup> for the queue, reference count table, frame or state of actor <sup>α</sup> in configuration <sup>C</sup>, respectively.

### **Definition 1 (Runtime entities and notation)**

$$\begin{aligned} \mathcal{C} &\in \textit{Config} = \textit{Heap} \times \textit{Actors} \\ \chi &\in \textit{Heap} = (\textit{Addr} \cdot \{\text{null}\}) \times FId \to \textit{Addr} \\ as \in \textit{Actors} &= \textit{Actor} \, \textit{Addr} \to \textit{Actor} \\ a &\in \textit{Actor} = \textit{Frame} \times \textit{Queue} \times \textit{References} \times \textit{PC} \\ &\quad \times \textit{State} \times \textit{Workset} \times \textit{Marks} \times PC \\ \phi &\in \textit{Frame} = \emptyset \quad \cup \ (\textit{BId} \times \textit{LocalMap}) \\ \psi &\in \textit{LocalMap} = \textit{VarId} \to \textit{Addr} \\ q &\in \textit{Queue} = \textit{Message}^{\star} \\ m &\in \textit{Message} ::= \text{orara}(\iota : z) \quad | \quad \textsf{app}(\phi) \\ \mathbf{r} &\in \textit{Reference} \times \textit{Addr} \to \mathbb{N} \end{aligned}$$

*State, Workset, Marks, and PC described in Definition 7.*

**Example:** Figure 4 shows C0, our running example for a runtime configuration. It has three actors: α1–α3, represented by light grey boxes, and eight objects, ω1–ω8, represented by circles. We show ownership by placing the objects in square boxes, *e.g.* <sup>O</sup>(ω7) = <sup>α</sup>1. We show references through arrows, *e.g.* <sup>ω</sup><sup>6</sup> references <sup>ω</sup><sup>8</sup> through field <sup>f</sup>7, that is, <sup>C</sup>0.heap(ω6, f7) = <sup>ω</sup>8. The frame of <sup>α</sup><sup>2</sup> contains behaviour identifier b , and maps x to ω8. All other frames are empty. The message queue of α<sup>1</sup> contains an application message for behaviour b and argument ω<sup>5</sup> for x, the queue of α<sup>2</sup> is empty, and the queue of α<sup>3</sup> an ORCA message for <sup>ω</sup>7. The bottom part shows reference count tables: <sup>α</sup>1.rc<sup>C</sup><sup>0</sup> (α1) = 21,

<sup>3</sup> Note that we omitted the class of objects. As our model is parametric with the type system, we can abstract from classes, and simplify our model.

and <sup>α</sup>1.rcC<sup>0</sup> (ω7) = 50. Entries of owned addresses are shaded. Since <sup>α</sup><sup>2</sup> owns <sup>α</sup><sup>2</sup> and <sup>ω</sup>2, the entries for <sup>α</sup>2.rcC<sup>0</sup> (α2) and <sup>α</sup>2.rcC<sup>0</sup> (ω2) are shaded. Note that <sup>α</sup><sup>1</sup> has a non-zero entry for ω7, even though there is no path from α<sup>1</sup> to ω7. There is no entry for ω1; no such entry is needed, because no actor except for its owner has a path to it. The 0 values indicate potentially non-existent entries in the corresponding tables; for example, the reference count table for actor α<sup>3</sup> needs only to contain entries for α1, α3, ω3, and ω4. Ownership does not restrict access to an address: *e.g.* actor α<sup>1</sup> does not own object ω3, yet may access it through the path this.f1.f2.f3, may read its field through this.f1.f2.f3.f4, and may mutate it, *e.g.* by this.f1.f2.f<sup>3</sup> = this.f1.

Lookup of fields in a configuration is defined in the obvious way, *i.e.*

**Definition 2.** <sup>C</sup>(ι.f) ≡ C.heap(ι, f)*, and* <sup>C</sup>(ι.f.f ) ≡ C.heap(C(ι.f,f ))

### **4.1 Capabilities and Accessibility**

ORCA considers three capabilities:

$$\kappa \in Capacity = \{\text{read}, \text{write}, \text{tag}\},$$

where read allows reading, write allows reading and writing, and tag forbids both read and write, but allows the use of an object's address. To describe the capability at which objects are visible from actors we use the concepts of *static* and *dynamic paths*.

*Static paths* consist of the keyword this (indicating a path starting at the current actor), or the name of a behaviour, b, and a variable, x, (indicating a path starting at local variable x from a frame of b), followed by any number of fields, f.

$$sp ::= \mathbf{t}\mathbf{h}\mathbf{\dot{s}} \quad \mid \quad b.x \quad \mid \quad sp.f$$

**Fig. 4.** Configuration C0. ω<sup>1</sup> is absent in the ref. counts, it has not been shared.

The host language must assign these capabilities to static paths. Thus, we assume it provides a static judgement of the form

$$A \vdash sp \colon \kappa \qquad \text{where } A \in ClassId$$

meaning that a static path *sp* has capability capability when "seen" from a class A. We highlight static judgments, *i.e.*, those provided by the type system in blue.

We expect the type system to guarantee that read and write access rights are "deep", meaning that all paths to a read capability must go through other read or write capabilities (**A1**), and all paths to a write capability must go through write capabilities (**A2**).

**Axiom 1** *For class identifier* A*, static path sp, field f , capability* κ*, we assume: A1* <sup>A</sup> *sp*.f : <sup>κ</sup> −→ ∃κ <sup>=</sup> tag. A *sp* : <sup>κ</sup> . *A2* <sup>A</sup> *sp*.f : write −→ <sup>A</sup> *sp* : write.

Such requirements are satisfied by many type systems with read-only references or immutability (*e.g.* [7,11,18,23,29,33,37,41]). An implication of **A1** and **A2** is that capabilities degrade with growing paths, *i.e.*, the prefix of a path has more rights than its extensions. More precisely: <sup>A</sup> *sp* : <sup>κ</sup> and <sup>A</sup> *sp*.f : <sup>κ</sup> imply that <sup>κ</sup> <sup>≤</sup> <sup>κ</sup> , where we define write <sup>&</sup>lt; read <sup>&</sup>lt; tag, and <sup>κ</sup> <sup>≤</sup> <sup>κ</sup> *iff* <sup>κ</sup> <sup>=</sup> <sup>κ</sup> or κ<κ .

**Example:** Table <sup>1</sup> shows capabilities for some paths from Fig. 4. Thus, <sup>A</sup><sup>1</sup> this.f<sup>1</sup> : write, and <sup>A</sup><sup>2</sup> <sup>b</sup> .x : write, and <sup>A</sup><sup>2</sup> this.f<sup>8</sup> : tag. The latter, together with **A1** gives that <sup>A</sup><sup>2</sup> this.f8.f : <sup>κ</sup> for all <sup>κ</sup> and <sup>f</sup>.

As we shall see later, the existence of a path does not imply that the path may be navigated. For example, <sup>C</sup>0(α2.f8.f4) = <sup>ω</sup>4, but actor <sup>α</sup><sup>2</sup> cannot access <sup>ω</sup><sup>4</sup> because of <sup>A</sup><sup>2</sup> this.f<sup>8</sup> : tag.

Moreover, it is possible for a path to have a capability, while not being defined. For example, Table <sup>1</sup> shows <sup>A</sup><sup>1</sup> this.f1.f<sup>2</sup> : write and it would be possible to have <sup>C</sup>i(α1.f1) = null, for some configuration <sup>C</sup><sup>i</sup> that derives from <sup>C</sup>0.



**Table 1.** Capabilities for paths, where A<sup>1</sup> = Class(α1) and A<sup>2</sup> = Class(α2).

*Dynamic paths* (in short paths p) start at the actor's fields, or frame, or at some pending message in an actor's queue (the latter cannot be navigated yet, but will be able to be navigated later on when the message is taken off the queue). Dynamic paths may be local paths (lp) or message paths. Local paths consist of this or a variable x followed by any number of fields f. In such paths, this is the current actor, and x is a local variable from the current frame. Message paths consist of k.x followed by a sequence of fields. If <sup>k</sup> <sup>≥</sup> 0, then k.x indicates the local variable <sup>x</sup> from the <sup>k</sup>-th message from the queue; <sup>k</sup> <sup>=</sup> <sup>−</sup>1 indicates variables from either (a) a message that has been popped from the queue, but whose frame has not yet been pushed onto the stack, or (b) a message whose frame has been created but not yet been pushed onto the queue. Thus, <sup>k</sup> <sup>=</sup> <sup>−</sup>1 indicates that either (a) a frame will be pushed onto the stack, during message receiving, or (b) a message will be pushed onto the queue during message sending.

<sup>p</sup> <sup>∈</sup> *Path* ::= lp <sup>|</sup> mp lp ::= this <sup>|</sup> <sup>x</sup> <sup>|</sup> lp.f mp ::= k.x <sup>|</sup> mp.f

We define accessibility as the lookup of a path provided that the capability for this path is defined. The *partial* function A returns a pair: the address accessible from actor α following path p, and the capability of α on p. A path of the form *p*.owner returns the owner of the object accessible though *p* and capability tag.

### **Definition 3 (accessibility).** *The partial function*

A : *Config* × *ActorAddr* × *Path* → (*Addr* × *Capability*) *is defined as*


We use <sup>A</sup>C(α, p) = <sup>ι</sup> as shorthand for <sup>∃</sup>κ. <sup>A</sup>C(α, p)=(ι, κ). The second and third case above ensure that the capability of a message path is the same as when the message has been taken off the queue and placed on the frame.

**Example:** We obtain that <sup>A</sup><sup>C</sup><sup>0</sup> (α1,this.f1.f2.f3)=(ω3,write), from the fact that Fig. <sup>4</sup> says that <sup>C</sup>0(α1.f1.f2.f3) = <sup>ω</sup><sup>3</sup> and from the fact that Table <sup>1</sup> says that <sup>A</sup><sup>1</sup> this.f1.f2.f<sup>3</sup> : write. Similarly, <sup>A</sup><sup>C</sup><sup>0</sup> (α2,this.f8)=(ω3,tag), and <sup>A</sup><sup>C</sup><sup>0</sup> (α2, x )=(ω8,write), and <sup>A</sup><sup>C</sup><sup>0</sup> (α1, <sup>0</sup>.x.f5.f7)=(ω8,tag).

Both <sup>A</sup><sup>C</sup><sup>0</sup> (α1,this.f1.f2.f3), and <sup>A</sup><sup>C</sup><sup>0</sup> (α2,this.f8) describe paths from actors' fields, while <sup>A</sup><sup>C</sup><sup>0</sup> (α2, x ) describes a path from the actor's frame, and finally <sup>A</sup><sup>C</sup><sup>0</sup> (α1, <sup>0</sup>.x.f5.f7) is a path from the message queue.

Accessibility describes what may be read or written to: <sup>A</sup><sup>C</sup><sup>0</sup> (α1,this.f1.f2.f3) = (ω3,write), therefore actor α<sup>1</sup> may mutate object ω3. However, this mutation is not visible by <sup>α</sup>2, even though <sup>C</sup>0(α2.f8) = <sup>ω</sup>3, because <sup>A</sup>C<sup>0</sup> (α2,this.f8)=(ω3,tag), which means that actor α<sup>2</sup> has only opaque access to ω3.

Accessibility plays a role in collection: If the reference f<sup>3</sup> were to be dropped it would be safe to collect ω4; even though there exists a path from α<sup>2</sup> to ω4; object ω<sup>4</sup> is not accessible to α2: the path this.f8.f<sup>4</sup> leads to ω<sup>4</sup> but will never be navigated (AC<sup>0</sup> (α2,this.f8.f4) is undefined). Also, <sup>A</sup>C(α2,this.f8.owner)=(α3,tag); thus, as long as <sup>ω</sup><sup>4</sup> is accessible from some actor, *e.g.* through <sup>C</sup>(α2.f8) = <sup>ω</sup>4, actor α<sup>3</sup> will not be collected.

Because the class of an actor as well as the capability attached to a static path are constant throughout program execution, the capabilities of paths starting from an actor's fields or from the same frame are also constant.

**Lemma 1.** *For actor* α*, fields* f*, behaviour b, variable x , fields f , capabilities* κ*,* κ *, configurations* C *and* C *, such that* C *reduces to* C*' in one or more steps:*

$$\begin{array}{ccccc} \mathsf{-}\mathcal{A}\_{\mathcal{C}}(\alpha,\mathsf{this}.\overline{f}) = (\mathsf{.},\mathsf{\kappa}) & \mathsf{.}\mathsf{.}\mathsf{\mathcal{A}}\_{\mathcal{C}'}(\alpha,\mathsf{this}.\overline{f}) = (\mathsf{.}',\mathsf{\kappa}') & \longrightarrow & \mathsf{\kappa}=\mathsf{\kappa}'\\\mathsf{-}\ \mathsf{-}\mathcal{A}\_{\mathcal{C}}(\alpha,x.\overline{f}) = (\mathsf{.},\mathsf{\kappa}) & \mathsf{.}\mathsf{\mathcal{A}}\_{\mathcal{C}'}(\alpha,x.\overline{f}) = (\mathsf{.}',\mathsf{\kappa}') & \wedge\\\alpha.\mathsf{frame}\_{\mathcal{C}} = (b,.) & \wedge & \alpha.\mathsf{frame}\_{\mathcal{C}'} = (b,.) & \longrightarrow & \kappa=\mathsf{\kappa}' \end{array}$$

### **4.2 Well-Formed Configurations**

We characterise data-race free configurations (-C ♦):

$$\begin{array}{llll}\textbf{Definition 4 (Data-race freeform).} & \mathbb{H}\in\mathscr{C}\diamond & \textbf{iff} \\ \forall\alpha,\alpha',p,p',\kappa,\kappa'. & \\ & \alpha\neq\alpha' & \wedge \quad \mathscr{A}\_{\mathscr{C}}(\alpha,p)=(\iota,\kappa) & \wedge \quad \mathscr{A}\_{\mathscr{C}}(\alpha',p')=(\iota,\kappa') \\ & \longrightarrow & \\ & \kappa\sim\kappa' & \\ \forall\text{ }\kappa\sim\kappa' & \\ & \kappa\sim\kappa' & \text{iff} \ [\ (\kappa=\text{write}\longrightarrow\kappa'=\text{tag})\ \wedge \ (\kappa'=\text{write}\longrightarrow\kappa=\text{tag})\ ] ] \end{array}$$

This definition captures invariant **I1**. The remaining invariants depend on the four derived counters introduced in Sect. 3. Here we define LRC and FRC, and give a preliminary definition of AMC and OMC.

#### **Definition 5 (Derived counters—preliminary for** AMC **and***ss* OMC**)**

$$\begin{aligned} \text{LRC}\_{\mathcal{C}}(\iota) & \equiv \mathcal{O}(\iota). \text{rc}\_{\mathcal{C}}(\iota) \\ \text{FRC}\_{\mathcal{C}}(\iota) & \equiv \sum\_{\alpha \neq \mathcal{O}(\iota)} \alpha. \text{rc}\_{\mathcal{C}}(\iota) \\ \text{OMC}\_{\mathcal{C}}(\iota) & \equiv \sum\_{j} \begin{cases} z & \text{if } \mathcal{O}(\iota). \text{qu}\_{\mathcal{C}}[j] = \text{cora}(\iota : z) \\ 0 & \text{otherwise} \end{cases} + \dots c.f. \text{Definition 12} \\ \text{AMC}\_{\mathcal{C}}(\iota) & \equiv \#\{\{\alpha, k\} \mid k > 0 \land \exists x. \overline{f}. \mathcal{A}\_{\mathcal{C}}(\alpha, k. x. \overline{f}) = \iota \} + \dots c.f. \text{Definition 12} \\ \text{where } \iota' \text{ denotes} \quad \iota'. \text{ Then} \end{aligned}$$

*where* # *denotes cardinality.*

For the time being, we will be reading this preliminary definition as if ... stood for 0. This works under the assumption the procedures are atomic. However Sect. 5.3, when we consider fine-grained concurrency, will refine the definition of AMC and OMC so as to also consider whether an actor is currently in the process of sending or receiving a message from which the address is accessible. For the time being, we continue with the preliminary reading.

**Example:** Assuming that in C<sup>0</sup> none of the actors is sending or receiving, we have LRC<sup>C</sup><sup>0</sup> (ω3) = 160, and FRC<sup>C</sup><sup>0</sup> (ω3) = 160, and OMC<sup>C</sup><sup>0</sup> (ω3) = 0, and AMC<sup>C</sup><sup>0</sup> (ω3) = 0. Moreover, AMC<sup>C</sup><sup>0</sup> (ω6) = AMC<sup>C</sup><sup>0</sup> (α2) = 1: neither <sup>ω</sup><sup>6</sup> nor <sup>α</sup><sup>2</sup> are arguments in application messages, but they are indirectly reachable through the first message on α1's queue.

A well-formed configuration requires: **I1**–**I4**: introduced in Sect. 3; **I5**: the RC's are non-negative; **I6**: accessible paths are not dangling; **I7**: processing message queues will not turn RC's negative; **I8**: actors' contents is in accordance with their state. The latter two will be described in Definition 14.

**Definition 6 (Well-formed configurations—preliminary).** - C*,* iff *for all* α*,* αo*,* ι*,* ι *,* <sup>p</sup>*,* lp*, and* mp*, such that* <sup>α</sup><sup>o</sup> <sup>=</sup> <sup>O</sup>(ι) <sup>=</sup> <sup>α</sup>*:*

**I<sup>1</sup>** - C ♦ **<sup>I</sup><sup>2</sup>** [ <sup>A</sup>C(α, p)=<sup>ι</sup> ∨ AC(αo, mp)=<sup>ι</sup> ] −→ LRCC(ι)><sup>0</sup> **<sup>I</sup><sup>3</sup>** <sup>A</sup>C(α, lp) = <sup>ι</sup> −→ α.rcC(ι) <sup>&</sup>gt; <sup>0</sup> **<sup>I</sup><sup>4</sup>** LRCC(ι) + OMCC(ι) = FRCC(ι) + AMCC(ι) **<sup>I</sup><sup>5</sup>** α.rcC(<sup>ι</sup> ) ≥ 0 **<sup>I</sup><sup>6</sup>** <sup>A</sup>C(α, p)=<sup>ι</sup> −→ C.heap(ι) <sup>=</sup><sup>⊥</sup> **I7, I<sup>8</sup>** *description in Definition 14.*

For ease of notation, we take **<sup>I</sup><sup>5</sup>** to mean that if α.rcC(<sup>ι</sup> ) is defined, then it is positive. And we take any undefined entry of α.rcC(ι) to be 0.

### **4.3 Actor States**

We now complete the definition of runtime entities (Definition 1), and describe the states of an actor, the worksets, the marks, and program counters. (Definition 7). We distinguish the following states: idle (IDLE), collecting (COLLECT), receiving (RECEIVE), sending a message (SEND), or executing the synchronous part of a behaviour (EXECUTE). We discuss these states in more detail next.

Except for the idle state, IDLE, all states use auxiliary data structures: *worksets*, denoted by ws, which stores a set of addresses; *marks* maps, denoted by ms, from addresses to R (reachable) or U (unreachable), and program counters. Frames are relevant when in states EXECUTE, or SEND, and otherwise are assumed to be empty. Worksets are used to store all addresses traced from a message or from the actor itself, and are relevant when in states SEND, or RECEIVE, or COLLECT, and otherwise are empty. Marks are used to calculate reachability and are used in state COLLECT, and are ignored otherwise. The program counters record the instruction an actor will execute next; they range between 4 and 27 and are ghost state, *i.e.* only used in the proofs.

**Fig. 5.** State transitions diagram for an actor.

### **Definition 7 (Actor States, Working sets, and Marks)**

```
st∈State ::= IDLE | EXECUTE | SEND | RECEIVE | COLLECT
ws∈Workset = P(Addr)
 ms∈Marks = Addr → {R, U}
   pc ∈ PC = [ 4..27 ]
```
We write α.stC, or α.wsC, or α.msC, or α.pc<sup>C</sup> for the state, working set, marks, or the program counter of <sup>α</sup> in <sup>C</sup>, respectively.

Actors may transition between states. The state transitions are depicted in Fig. 5. For example, an actor in the idle state (IDLE) may receive an orca message (remaining in the same state), receive an app message (moving to the RECEIVE state), or start garbage collection (moving to the COLLECT state).

In the following sections we describe the actions an actor may perform. Following the style of [17,26,27] we describe actors' actions through pseudo-code procedures, which have the form:

procedure nameα: condition <sup>→</sup> { instructions }

We let α denote the executing actor, and the left-hand side of the arrow describes the condition that must be satisfied in order to execute the instructions on the arrow's right-hand side. Any actor may execute concurrently with other actors. To simplify notation, we assume an implicit, globally accessible configuration <sup>C</sup>. Thus, instruction <sup>α</sup>.state:=EXECUTE is short for updating the state of <sup>α</sup> in <sup>C</sup> to be EXECUTE. We elide configurations when obvious, *e.g.* <sup>α</sup>.frame <sup>=</sup> <sup>φ</sup> is short for requiring that in <sup>C</sup> the frame of <sup>α</sup> is <sup>φ</sup>, but we mention them when necessary—*e.g.* - <sup>C</sup>[ι1, f → <sup>ι</sup>2] ♦ expresses that the configuration that results from updating field f in ι<sup>1</sup> is data-race free.

*Tracing Function.* Both garbage collection, and application message sending/receiving need to find all objects accessible from the current actor and/or from the message arguments. We define two functions: trace this finds all addresses which are accessible from the current actor, and trace frame finds all addresses which are accessible through a stack frame (but not from the current actor, this).

**Fig. 6.** Pseudo-code for garbage collection.

**Definition 8 (Tracing).** *We define the functions*

trace this : *Config*×*ActorAddr* → P(*Addr*) trace frame : *Config*×*ActorAddr*× *Frame* → P(*Addr*) *as follows* trace thisC(α) ≡{ι| ∃f. <sup>A</sup>C(α,this.f)=ι}

trace frameC(α, φ) ≡{ι| ∃<sup>x</sup> <sup>∈</sup> dom(φ), f. <sup>A</sup>C(α, x.f)=ι}

### **4.4 Garbage Collection**

We describe garbage collection in Fig. 6. An idle, or an executing actor (precondition on line 2) may start collecting at any time. Then, it sets its state to COLLECT (line 5), and initialises the marks, ms, to empty (line 6).

The main idea of ORCA collection is that the requirement for global unreachability of owned objects can be weakened to the local requirement to local unreachability and a LRC = 0. Therefore, the actor marks all owned objects, and all addresses with a RC > 0 as U (line 9). After that, it traces the actor's fields, and also the actor's frame if it happens not to be empty (as we shall see later, idle actors have empty frames) and marks all accessible addresses as R (line 12). Then, the actor marks all owned objects with RC > 0 as R (line 15). Thus we expect that: **(\*)** *Any* ι with ms(ι)=U *is locally unreachable, and if owned by the current actor, then its* LRC*is 0.* For each address with ms(ι) = U, if the actor owns ι, then it collects it (line 20)—this is sound because of **I2**, **I3**, **I<sup>4</sup>** and (\*). If the actor does not own ι, then it asks ι's owner to decrement its reference count by the current actor's reference count, and deletes its own reference count to it (thus becoming 0) (line 24)—this preserves **I2**, **I<sup>3</sup>** and **I4**.

There is no need for special provision for cycles across actor boundaries. Rather, the corresponding objects will be collected by each actor separately, when it is the particular actor's turn to perform GC.

**Example:** Look at the cycle ω5–ω6, and assume that the message app(b, ω5) had finished execution without any heap mutation, and that <sup>α</sup>1.rcC(ω5) = <sup>α</sup>1.rcC(ω6)=1= <sup>α</sup>2.rcC(ω5) = <sup>α</sup>2.rcC(ω6)—this will be the outcome of the example in Sect. 4.5. Now, the objects ω<sup>5</sup> and ω<sup>6</sup> are globally unreachable. Assume that α<sup>1</sup> performs GC: it will *not* be able to collect any of these objects, but it will send a orca(ω<sup>6</sup> :−1) to <sup>α</sup>2. Some time later, <sup>α</sup><sup>2</sup> will pop this message, and some time later it will enter a GC cycle: it will collect <sup>ω</sup>6, and send a orca(ω<sup>5</sup> :−1) to α1. When, later on, α<sup>1</sup> pops this message, and later enters a GC cycle, it will collect ω5.

At the end of the GC cycle, the actor sets is state back to what it was before (line 26). If the frame is empty, then the actor had been IDLE, otherwise it had been in state EXECUTE.

### **4.5 Receiving and Sending Messages**

Through message send or receive, actors share addresses with other actors. This changes accessibility. Therefore, action is needed to re-establish **I<sup>3</sup>** and **I<sup>4</sup>** for all the objects accessible from the message's arguments.

*Receiving application messages* is described by Receiving in Fig. 7. It requires that the actor α is in the IDLE state and has an application message on top of its queue. The actor sets its state to RECEIVE (line 5), traces from the message arguments and stores all accessible addresses into ws (line 7). Since accessibility is not affected by other actors' actions, *c.f., last paragraph in Sect.* 4.6 it is legitimate to consider the calculation of trace frame as one single step. It then pops the message from its queue (line 8), and thus the AMC for all the addresses in ws will decrease by 1. To preserve **I4**, for each ι in its ws, the actor:


After that, the actor sets its frame to that from the message (line 17), and goes to the EXECUTE state (line 18).

**Example:** Actor α<sup>1</sup> has an application message in its queue. Assuming that it is IDLE, it may execute Receiving: It will trace ω<sup>5</sup> and as a result store

**Fig. 7.** Receiving application and ORCA messages.

{ω5, ω6, ω8, α1, α2} in its ws. It will then decrement its reference count for <sup>ω</sup><sup>5</sup> and α<sup>1</sup> (the owned addresses) and increment it for the others. It will then pop the message from its queue, create the appropriate frame, and go to state EXECUTE.

*Receiving ORCA messages* is described in Fig. 7. An actor in the IDLE state with an ORCA message at the top, pops the message from its queue, and adds the value z to the reference count for ι, and stays in the IDLE state.

*Sending application messages* is described in Fig. 8. The actor must be in the EXECUTE state for some behaviour b and must have local variables which can be split into ψ and ψ —the latter will form part of the message to be sent. As the AMC for all the addresses reachable through the message increases by 1, in order to preserve **I<sup>4</sup>** for each address ι in ws, the actor:


After this, it removes ψ from its frame (line 22), pushes the message app(b , ψ ) onto α 's queue, and transitions to the EXECUTE state.

**Fig. 8.** Pseudo-code for message sending.

We now discuss the preconditions. These ensure that sending the message app(b, ψ ) will not introduce data races: Line 4 ensures that there are no data races between paths starting at ψ and paths starting at ψ , while Line 5 ensures that the sender, α, and the receiver, α see all the paths sent, *i.e.* those starting from (b , ψ ), at the same capability. We express our expectation that the source language compiler produces code only if it satisfies this property by adding this static requirement as a precondition. These static requirements imply that after the message has been sent, there will be no races between paths starting at the sender's frame and those starting at the last message in the receiver's queue. In more detail, after the sender's frame has been reduced to (b, ψ), and app(b , ψ ) has been added to the receiver's queue (at location k), we will have a new configuration C <sup>=</sup>C[α, frame → (b, ψ)][α , queue → <sup>α</sup> .queue<sup>C</sup> :: (b , ψ )]. In this new configuration lines 4 and 5 ensure that AC- (α, x.f)=(ι, κ) ∧ AC- (α , k.x .f ) = (ι, κ ) −→ <sup>κ</sup> <sup>∼</sup> <sup>κ</sup>, which means that if there were no data races in <sup>C</sup>, there will be no data races in C either. Formally: - C ♦ −→ -C ♦.

We can now complete Definition 3 for the receiving and the sending cases, to take into account paths that do not exist yet, but which will exist when the message receipt or message sending has been completed.

**Definition 9 (accessibility—receiving and sending).** *Completing Definition 3:* <sup>A</sup>C(α, <sup>−</sup>1.x.f)=(ι, κ) *iff* α.st<sup>C</sup> <sup>=</sup> *Receiving* <sup>∧</sup> <sup>9</sup> <sup>≤</sup> α.pc<sup>C</sup> <sup>&</sup>lt; <sup>18</sup> ∧ C(ψ(x).f) = <sup>ι</sup> <sup>∧</sup> <sup>C</sup>lass(α) b.x.f : <sup>κ</sup> *where* (b, ψ)*is the frame popped at line 8, or* α.st<sup>C</sup> <sup>=</sup> *Sending* <sup>∧</sup> α.pc<sup>C</sup> = 23 ∧ C(ψ (x).f) = <sup>ι</sup> <sup>∧</sup> <sup>C</sup>lass(α ) <sup>b</sup> .x.f : κ *where* α *is the actor to receive the app-message, and* (b , ψ ) *is the frame to be sent in line 23.*

**Example:** When actor α<sup>1</sup> executes Receiving, and its program counter is between 9 and 18, then <sup>A</sup><sup>C</sup><sup>0</sup> (α1, <sup>−</sup>1.x.f5)=(ω6,write), even though <sup>x</sup> is not yet on the stack frame. As soon as the frame is pushed on the stack, and we reach program counter 20, then t <sup>A</sup><sup>C</sup><sup>0</sup> (α1, <sup>−</sup>1.x.f5) is undefined, but <sup>A</sup><sup>C</sup><sup>0</sup> (α1, x.f5)=(ω6,write).

#### **4.6 Actor Behaviour**

As our model is parametric with the host language, we do not aim to describe any of the actions performed while executing behaviours, such as synchronous method calls and pushing frames onto stacks, conditionnals, loops etc. Instead, we concentrate on how behaviour execution may affect GC; this happens only when the heap is mutated either by object creation or by mutation of objects' fields (since this affects accessibility). In particular, our model does not accommodate for recursive calls; we claim that the result from the current model would easily be extended to a model with recursion in synchronous behaviour, but would require a considerable notation overhead.

Figure 9 shows the actions of an actor α while in the EXECUTE state, *i.e.* while it executes behaviours synchronously. The description is nondeterministic: the procedures GoIdle, or Create, or MutateHeap, may execute when the corresponding preconditions hold. Thus, we do not describe the execution of a given program, rather we describe all possible executions for any program. In GoIdle, the actor α simply passes from the execution state to the idle state; the only condition is that its state is EXECUTE (line 2). It deletes the frame, and sets the actor's state to IDLE (line 4). Create creates a new object, initialises its fields to null, and stores its address into local variable x.

The most interesting procedure is field assignment, MutateHeap. line 8 modifies the object at address ι1, reachable through local path lp1, and stores in its field f the address ι<sup>2</sup> which was reachable through local path lp2. We require that the type system makes the following two guarantees: line 2, second conjunct, requires that lp1 should be writable, while line 3 requires that lp2 should be accessible. Line 4 and line 5 requite that capabilities of objects do not increase through heap mutation: any address that is accessible with a capability κ after the field update was accessible with the same or more permissive capability κ before the field update. This requirment guarantees preservation of data race freedom, *i.e.* that - C ♦ implies -<sup>C</sup>[ι1, f → <sup>ι</sup>2] ♦.

**Fig. 9.** Pseudo-code for synchronous operations.

*Heap Mutation Does not Affect Accessibility in Other Actors.* Heap mutation either creates new objects, which will not be accessible to other actors, or modifies objects to which the current actor has write access. By - C ♦ all other actors have only tag access to the modified object. Therefore, because of *capabilities' degradation with growing paths (as in* **A1** *and* **A2***)*, no other actor will be able to access objects reachable through paths that go through the modified object.

### **5 Soundness and Completeness**

In this section we show soundness and completeness of ORCA.

### **5.1 I<sup>1</sup> and I<sup>2</sup> Support Safe Local GC**

As we said earlier, **I<sup>1</sup>** and **I<sup>2</sup>** support safe local GC. Namely, **I<sup>1</sup>** guarantees that as long as GC only traces objects to which the actor has read or write access, there will be no data races with other actors' behaviour or GC. And **I<sup>2</sup>** guarantees that collection can take place based on local information only:

**Definition 10.** *For a configuration* <sup>C</sup>*, and object address* <sup>ω</sup> *we say that*

*–* <sup>ω</sup> *is* globally inaccessible *in* <sup>C</sup>*, iff* <sup>∀</sup>α, p. <sup>A</sup>C(α, p) <sup>=</sup> <sup>ω</sup>

*–* <sup>ω</sup> *is* collectable*, iff* LRCC(ω)=0*, and* <sup>∀</sup>lp. <sup>A</sup>C(O(ω),lp) <sup>=</sup> <sup>ω</sup>*.*

**Lemma 2.** *If* **I<sup>2</sup>** *holds, then every collectable object is globally inaccessible.*

### **5.2 Completeness**

In [16] we show that globally inaccessible objects remain so, and that for any globally inaccessible object there exists a sequence of steps which will collect it.

**Theorem 1 (Inaccessibility is monotonic).** *For any configurations* C*, and* C *, if* C *is the outcome of the execution of any single line of code from any of the procedures from Figs. 6, 7, <sup>8</sup> and 9, and* <sup>ω</sup> *is globally inaccessible in* <sup>C</sup>*, then* <sup>ω</sup> *is globally inaccessible in* <sup>C</sup> *.*

**Theorem 2 (Completeness of** ORCA**).** *For any configuration* C*, and object address* <sup>ω</sup> *which is globally inaccessible in* <sup>C</sup>*, there exists a finite sequence of steps which lead to* <sup>C</sup> *in which* ω /<sup>∈</sup> dom(C )*.*

### **5.3 Dealing with Fine-Grained Concurrency**

So far, we have discussed actions under an assumption atomicity. However, ORCA needs to work under fine-grained concurrency, whereby several actors may be executing concurrently, each of them executing a behaviour, or sending or receiving a message, or collecting garbage. With fine-grained concurrency, and with the preliminary definitions of AMC and OMC, the invariants are no longer preserved. In fact, they need never hold!

**Example:** Consider Fig. 4, and assume that actor α<sup>1</sup> was executing Receiving. Then, at line 7 and before popping the message off the queue, we have LRC(ω5) = 2, FRC(ω5) = 1, AMC<sup>p</sup>(ω5) = 1, where AMC<sup>p</sup>( ) stands for the preliminary definition of AMC; thus **I<sup>4</sup>** holds. After popping and before updating the RC for ω5, *i.e.* between lines 9 and 11, we have AMC<sup>p</sup>(ω5) = 0—thus **I<sup>4</sup>** is broken. At first sight, this might not seem a big problem, because the update of RC at line 12 will set LRC(ω5) = 1, and thus restore **I4**. However, if there was another message containing ω<sup>5</sup> in α2's queue, and consider a snapshot where α<sup>2</sup> had just finished line 8 and α<sup>1</sup> had just finished line 12, then the update of α1's RC will *not* restore **I4**.

The reason for this problem is, that with the preliminary definition AMC<sup>p</sup>( ), upon popping at line 8, the AMC is decremented in one atomic step for all objects accessible from the message, while the RC is updated later on (at line 12 or line 14), and one object at a time. In other words, the updates to AMC and LRC are not in sync. Instead, we give the full definition of AMC so, that AMC is in sync LRC; namely it is not affected by popping the message, and is reduced one object at a time once we reach program counter line 15. Similarly, because updating the RC's takes place in a separate step from the removal of the ORCA-message from its queue, we refine the definition of OMC:

**Definition 11 (Auxiliary Counters for** AMC**, and** OMC**)**

$$\begin{array}{rcl} \text{AMC}^{\text{rcv}}\_{\mathcal{C}}(\iota) & \equiv & \#\{\alpha \mid \quad \alpha. \mathsf{st}\_{\mathcal{C}} = \textit{RCE} \textit{E} \textit{V} \textit{E} & \wedge \ \boldsymbol{9} \leq \alpha. \mathsf{pc}\_{\mathcal{C}} \ \wedge \\ & & \iota \in \alpha. \mathsf{ws} \,\mathsf{\{CurrAddrRcv\_{\mathcal{C}}(\alpha)\}} \\ \mathcal{C} \textit{urrAddrRcv\_{\mathcal{C}}(\alpha)} & \equiv & \left\{ \begin{array}{rcl} \{\iota\_{10}\} & \textit{if } \alpha. \mathsf{pc}\_{\mathcal{C}} = 15 \\ \emptyset & \textit{otherwise} \end{array} \right. \end{array}$$

*In the above* α.ws *refers to the contents of the variable* ws *while the actor* α *is executing the pseudocode from Receiving, and* ι<sup>10</sup> *refers to the contents of the variable* ι *arbitrarily chosen in line 10 of the code.*

*We define* AMC*snd* <sup>C</sup> (ι)*,* OMC*rcv* <sup>C</sup> (ι)*, and* OMC*snd* <sup>C</sup> (ι) *similarly in [16].*

The counters AMCrcv and AMCsnd are zero except for actors which are in the process of receiving or sending application messages. Also, the counters OMCrcv and AMCsnd are zero except for actors which are in the process of receiving or sending ORCA-messages. All these counters are always ≥ 0. We can now complete the definition of AMC and OMC:

**Definition 12 (**AMC **and** OMC *–* **full definition)**

$$\text{OMC}\_{\mathcal{C}}(\iota) \equiv \sum\_{j} \begin{cases} z & \text{if } \mathcal{O}(\iota).\mathsf{qu}\_{\mathcal{C}}[j] = \mathsf{occa}(\iota : z) \\ 0 & \text{otherwise} \end{cases} + \text{OMC}\_{\mathcal{C}}^{\mathsf{and}}(\iota) - \text{OMC}\_{\mathcal{C}}^{\mathsf{ev}\nu}(\iota). \end{cases}$$

AMC*C*(ι) <sup>≡</sup> #{ (α, k) <sup>|</sup> k ><sup>0</sup> ∧ ∃x.f.A*C*(α, k.x.f) = <sup>ι</sup> } + AMC*snd <sup>C</sup>* (ι) + AMC*rcv <sup>C</sup>* (ι)

*where* # *denotes cardinality.*

**Example:** Let us again consider that α<sup>1</sup> was executing Receiving. Then, at line 10 we have ws <sup>=</sup> {ι5, ι6} and AMC(ω5) = 1 = AMC(ω6). Assume at the first iteration, at line 10 we chose ι5, then right before reaching line 15 we have AMC(ω5) = 0 and AMC(ω6) = 1. At the second iteration, at line 10 we will chose ι6, and then right before reaching 15 we have AMC(ω6) = 0.

### **5.4 Soundness**

To complete the definition of well-formed configurations, we need to define what it means for an actor or a queue to be well-formed.

**Well-Formed Queues - I7.** The owner's reference count for any live address (*i.e.* any address reachable from a message path, or foreign actor, or in an ORCA message) should be greater than 0 at the current configuration, as well as, at all configurations which arise from receiving pending, but no new, messages from the owner's queue. Thus, in order to ensure that ORCA decrement messages do not make the local reference count negative, **I<sup>7</sup>** requires that the effect of any prefix of the message queue leaves the reference count for any object positive. To formulate **<sup>I</sup><sup>7</sup>** we use the concept of *QueueEffect*C(α, ι, n), which describes the contents of LRC after the actor α has consumed and reacted to the first n messages in its queue—*i.e.* is about "looking into the future". Thus, for actor α, address ι, and number n we define the effect of the n-prefix of the queue on the reference count as follows:

$$Queu\text{e}\text{f}\text{f}\text{e}\text{e}\_{\mathcal{C}}(\alpha,\iota,n) \equiv \text{LRC}\_{\mathcal{C}}(\iota) - z + \sum\_{j=0}^{n} \text{Weight}\_{\mathcal{C}}(\alpha,\iota,j).$$

where <sup>z</sup> <sup>=</sup> <sup>k</sup>, if <sup>α</sup> is in the process of executing ReceiveORCA, and <sup>α</sup>.pc<sup>C</sup> = 6, and α.qu.top = orca (ι : k), and otherwise z = 0.

And where,

$$Weight\_{\mathcal{C}}(\alpha, \iota, j) \equiv \begin{cases} z' & \text{if } \alpha. \mathsf{qu}\_{\mathcal{C}}[j] = \mathsf{cora}(\iota : z') \\ -1 & \text{if } \exists x. \exists \overline{f}. \, \mathcal{A}\_{\mathcal{C}}(\alpha, k. x. \overline{f}) = \iota \wedge \; \mathcal{O}(\iota) = \alpha \\ 0 & \text{otherwise} \end{cases}$$

**I<sup>7</sup>** makes the following four guarantees: **[a]** The effect of any prefix of the message queue leaves the LRC non-negative. **[b]** If ι is accessible from the jth message in its owner's queue, then the LRC for ι will remain >0 during execution of the current message queue up to, and including, the j-th message. **[c]** If ι is accessible from an ORCA-message, then the LRC will remain >0 during execution of the current message queue, up to and excluding execution of the ORCA-message itself. **[d]** If ι is globally accessible (*i.e.* reachable from a local path or from a message in a non-owning actor) then LRC(ι) is currently >0, and will remain so after during popping of all the entries in the current queue.

**Definition 13 (I7).** <sup>|</sup>=Queues <sup>C</sup>*, iff for all* <sup>j</sup> <sup>∈</sup> <sup>N</sup>*, for all addresses* <sup>ι</sup>*, actors* <sup>α</sup>*,* α *, where* <sup>O</sup>(ι) = <sup>α</sup> <sup>=</sup> <sup>α</sup> *, the following conditions hold:*


For example, in a configuration with LRC(ι) = 2, and a queue with orca(ι : <sup>−</sup>2) :: orca(<sup>ι</sup> : <sup>−</sup>1) :: orca(<sup>ι</sup> : 256) is illegal by **<sup>I</sup>7**.**[a]**. Similarly, in a configuration with LRC(ι) = 2, and a queue with orca(<sup>ι</sup> : <sup>−</sup>2) :: orca(<sup>ι</sup> : 256), the owning actor could collect ι before popping the message orca(ι : 256) from its queue. Such a configuration is also deemed illegal by **I7**.**[c]**.

**I8-Well-Formed Actor.** In [16] we define well-formedness of an actor α through the judgement <sup>C</sup>, α st. This judgement depends on <sup>α</sup>'s current state st, and requires, among other things, that the contents of the local variables ws, ms are consistent with the contents of the pc and RC. Remember also, that because Receiving and Sending modify the ws or send ORCA-messages before updating the frame or sending the application message, in the definition of AMC and OMC we took into account the internal state of actors executing such procedures.

**Well-Formed Configuration.** The following completes Definition 6 from Sect. 4.2.

**Definition 14 (Well-formed configurations—full).** *A configuration* C *is well-formed,* - C*,* iff **I1***–***I<sup>6</sup>** *(Definition 6) for* C*, if its queues are well-formed (*|=Queues <sup>C</sup>*,* **<sup>I</sup>7***), as well as, all its actors (*C, α α.stC*,* **<sup>I</sup>8***).*

In [16] we consider the execution of each line in the codes from Sect. 4, and prove:

**Theorem 3 (Soundness of** ORCA**).** *For any configurations* C *and* C *: If* - C*, and* C *is the outcome of the execution of any single line of code from any of the procedures from Figs. 6, 7, 8 and 9, then* - C *.*

This theorem together with **I<sup>6</sup>** implies that ORCA never leaves accessible paths dangling. Note that the theorem is stated so as to be applicable for a fine interleaving of the execution. Even though we expressed ORCA through procedures, in our proof we cater for an execution where one line of any of these procedures is executed interleaved with any other procedures in the other actors.

### **6 Related Work**

The challenges faced when developing and debugging concurrent garbage collectors have motivated the development of formal models and proofs of correctness [6,13,19,30,35]. However, most work considers a global heap where mutator and collector threads *race* for objects and relies on synchronisation mechanisms (or atomic reduction steps), such as read or write barriers, in contrast to ORCA which considers many local heaps, no atomicity or synchronization, and relies on the properties of the type system. McCreight et al. [25] introduced a framework to reason about and build certified garbage collectors, verifying independently both mutator and collector threads. Their work focuses mainly on garbage collectors similar to those that run on Java programs, such as STW mark-and-sweep, STW copying and incremental copying. Vechev et al. [39] specified concurrent mark-and-sweep collectors with write barriers for synchronisation. The authors also present a parametric garbage collector from which other collectors can be derived. Hawblitzel and Petrank [22] mechanized proofs of two real-world collectors (copying and mark-and-sweep) and their respective allocators. The assembly code was instrumented with pre- and post-conditions, invariants and assertions, which were then verified using Z3 and Boogie. Ugawa et al. [38] extended a copying, on-the-fly, concurrent garbage collector to process reference types. The authors model-checked their algorithm using a model that limited the number of objects and threads. Gamie et al. [17] machine-checked a state-of-the-art, onthe-fly, concurrent, mark-and-sweep garbage collector [32]. They modelled one collector thread and many mutator threads. ORCA does not limit the number of actors running concurrently.

Local heaps have been used in the context of garbage collection to reduce the amount of synchronisation required before [1–3,13,15,24,31,34], where different threads have their own heap and share a global heap. However, only two of these have been proved correct. Doligez and Gonthier [13] proved a collector [14] which splits the heap into many local heaps and one global heap, and uses mark-andsweep for individual collection of local heaps. The algorithm imposes restrictions on the object graph, that is, a thread cannot access objects in other threads' local heaps. ORCA allows for references across heaps. Raghunathan et al. [34] proved correct a hierarchical model of local heaps for functional programming languages. The work restricted objects graphs and prevented mutation.

As for collectors that rely on message passing, Moreau et al. [26] revisited the Birrell's reference listing algorithm, which also uses message passing to update reference counts in a distributed system, and presented its formalisation and proofs or soundness and completeness. Moreover, Clebsch and Drossopoulou [10] proved correct MAC, a concurrent collector for actors.

### **7 Conclusions**

We have shown the soundness and completeness of the ORCA actor memory reclamation protocol. The ORCA model is not tied to a particular programming language and is parametric in the host language. Instead it relies on a number of invariants and properties which can be met by a combination of language and static checks. The central property that is required is the absence of data races on objects shared between actors.

We developed a formal model of ORCA and identified requirements for the host language, its type system, or associated tooling. We described ORCA at a language-agnostic level and identified eight invariants that capture how global consistency is obtained in the absence of synchronisation. We proved that ORCA will not prematurely collect objects (soundness) and that all garbage will be identified as such (completeness).

**Acknowledgements.** We are deeply grateful to Tim Wood for extensive discussions and suggestions about effective communication of our ideas. We thank Rakhilya Mekhtieva for her contributions to the formal proofs, Sebastian Blessing and Andy McNeil for their contributions to the implementation, as well as the anonymous reviewers for their insightful comments. This work was initially funded by Causality Ltd, and has also received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement 695412) and the FP7 project UPSCALE, the Swedish Research council through the grant Structured Aliasing and the UPMARC Linneaus Centre of Excellence, the EPSRC (grant EP/K011715/1), the NSF (award 1544542) and ONR (award 503353).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Paxos Consensus, Deconstructed and Abstracted**

Alvaro Garc´ ´ ıa-P´erez1(B) , Alexey Gotsman<sup>1</sup>, Yuri Meshman<sup>1</sup>, and Ilya Sergey<sup>2</sup>

<sup>1</sup> IMDEA Software Institute, Madrid, Spain {alvaro.garcia.perez,alexey.gotsman,yuri.meshman}@imdea.org <sup>2</sup> University College London, London, UK i.sergey@ucl.ac.uk

**Abstract.** Lamport's Paxos algorithm is a classic consensus protocol for state machine replication in environments that admit crash failures. Many versions of Paxos exploit the protocol's intrinsic properties for the sake of gaining better run-time performance, thus widening the gap between the original description of the algorithm, which was proven correct, and its real-world implementations. In this work, we address the challenge of specifying and verifying complex Paxos-based systems by (a) devising composable specifications for implementations of Paxos's singledecree version, and (b) engineering disciplines to reason about protocolaware, semantics-preserving optimisations to single-decree Paxos. In a nutshell, our approach elaborates on the deconstruction of single-decree Paxos by Boichat et al. We provide novel non-deterministic specifications for each module in the deconstruction and prove that the implementations refine the corresponding specifications, such that the proofs of the modules that remain unchanged can be reused across different implementations. We further reuse this result and show how to obtain a verified implementation of Multi-Paxos from a verified implementation of singledecree Paxos, by a series of novel protocol-aware transformations of the network semantics, which we prove to be behaviour-preserving.

### **1 Introduction**

Consensus algorithms are an essential component of the modern fault-tolerant deterministic services implemented as message-passing distributed systems. In such systems, each of the distributed nodes contains a replica of the system's state (*e.g.*, a database to be accessed by the system's clients), and certain nodes may propose values for the next state of the system (*e.g.*, requesting an update in the database). Since any node can crash at any moment, all the replicas have to keep copies of the state that are consistent with each other. To achieve this, at each update to the system, all the non-crashed nodes run an instance of a *consensus protocol*, uniformly deciding on its outcome. The safety requirements for consensus can be thus stated as follows: "only a single value is decided uniformly by all non-crashed nodes, it never changes in the future, and the decided value has been proposed by some node participating in the protocol" [16].

The Paxos algorithm [15,16] is the classic consensus protocol, and its singledecree version (SD-Paxos for short) allows a set of distributed nodes to reach an agreement on the outcome of a *single* update. Optimisations and modifications to SD-Paxos are common. For instance, the multi-decree version, often called Multi-Paxos [15,27], considers multiple slots (*i.e.*, multiple positioned updates) and decides upon a result for *each* slot, by running a slot-specific instance of an SD-Paxos. Even though it is customary to think of Multi-Paxos as of a series of independent SD-Paxos instances, in reality the implementation features multiple protocol-aware optimisations, exploiting intrinsic dependencies between separate single-decree consensus instances to achieve better throughput. To a great extent, these and other optimisations to the algorithm are pervasive, and verifying a modified version usually requires to devise a new protocol definition and a proof from scratch. New versions are constantly springing (*cf.* Sect. 5 of [27] for a comprehensive survey) widening the gap between the description of the algorithms and their real-world implementations.

We tackle the challenge of *specifying* and *verifying* these distributed algorithms by contributing two verification techniques for consensus protocols.

Our first contribution is a family of composable specifications for Paxos' core subroutines. Our starting point is the deconstruction of SD-Paxos by Boichat *et al.* [2,3], allowing one to consider a distributed consensus instance as a *shared-memory concurrent program*. We introduce novel specifications for Boichat *et al.*'s modules, and let them be non-deterministic. This might seem as an unorthodox design choice, as it *weakens* the specification. To show that our specifications are still *strong enough*, we restore the top-level *deterministic* abstract specification of the consensus, which is convenient for client-side reasoning. The weakness introduced by the non-determinism in the specifications has been impelled by the need to prove that the implementations of Paxos' components *refine* the specifications we have ascribed [9]. We prove the refinements modularly via the Rely/Guarantee reasoning with prophecy variables and explicit linearisation points [11,26]. On the other hand, this weakness becomes a virtue when better understanding the volatile nature of Boichat *et al.*'s abstractions and of the Paxos algorithm, which may lead to newer modifications and optimisations.

Our second contribution is a methodology for verifying composite consensus protocols by reusing the proofs of their constituents, targeting specifically Multi-Paxos. We distill protocol-aware system optimisations into a separate semantic layer and show how to obtain the realistic Multi-Paxos implementation from SD-Paxos by a *series of transformations* to the *network semantics* of the system, as long as these transformations preserve the behaviour observed by clients. We then provide a family of such transformations along with the formal conditions allowing one to compose them in a behaviour-preserving way.

We validate our approach for construction of modularly verified consensus protocols by providing an executable proof-of-concept implementation of Multi-Paxos with a high-level shared memory-like interface, obtained via a series of behaviour-preserving network transformations. The full proofs of lemmas and

**Fig. 1.** A run of SD-Paxos.

theorems from our development, as well as some boilerplate definitions, are given in the appendices of the supplementary extended version of this paper.<sup>1</sup>

### **2 The Single-Decree Paxos Algorithm**

We start with explaining SD-Paxos through an intuitive scenario. In SD-Paxos, each node in the system can adopt the roles of *proposer* or *acceptor*, or both. A value is decided when a *quorum* (*i.e.*, a majority of acceptors) accepts the value proposed by some proposer. Now consider a system with three nodes N1, N2 and N3, where N1 and N3 are both proposers and acceptors, and N2 is an acceptor, and assume N1 and N3 propose values v<sup>1</sup> and v3, respectively.

The algorithm works in two phases. In Phase 1, a proposer polls every acceptor in the system and tries to convince a quorum to promise that they will later accept its value. If the proposer succeeds in Phase 1 then it moves to Phase 2, where it requests the acceptors to fulfil their promises in order to get its value decided. In our example, it would seem in principle possible that N1 and N3 could respectively convince two different quorums—one consisting of N1 and N2, and the other consisting of N2 and N3—to go through both phases and to respectively accept their values. This would happen if the communication between N1 and N3 gets lost and if N2 successively grants the promise and accepts the value of N1, and then does the same with N3. This scenario breaks the safety requirements for consensus because both v<sup>1</sup> and v3—which can be different—would get decided. However, this cannot happen. Let us explain why.

The way SD-Paxos enforces the safety requirements is by distinguishing each attempt to decide a value with a unique *round*, where the rounds are totally ordered. Each acceptor stores its current round, initially the least one, and only grants a promise to proposers with a round greater or equal than its current round, at which moment the acceptor switches to the proposer's round. Figure 1 depicts a possible run of the algorithm. Assume that rounds are natural numbers, that the acceptors' current rounds are initially 0, and that the nodes N1 and N3 attempt to decide their values with rounds 1 and 3 respectively. In Phase 1, N1 tries to convince a quorum to switch their current round to 1 (messages P1A(1)). The message to N3 gets lost and the quorum consisting of N1 and N2 switches round and promises to only accept values at a round greater or

<sup>1</sup> Find the extended version online at https://arxiv.org/abs/1802.05969.

**Fig. 2.** Deconstruction of SD-Paxos (left) and specification of module *Paxos* (right).

equal than 1. Each acceptor that switches to the proposer's round sends back to the proposer its stored value and the round at which this value was accepted, or an undefined value if the acceptor never accepted any value yet (messages P1B(ok, <sup>⊥</sup>, 0), where <sup>⊥</sup> denotes a default undefined value). After Phase 1, N1 picks as a candidate value the one accepted at the greatest round from those returned by the acceptors in the quorum, or its proposed value if all acceptors returned an undefined value. In our case, N1 picks its value v1. In Phase 2, N1 requests the acceptors to accept the candidate value v<sup>1</sup> at round 1 (messages P2A(v<sup>1</sup>, 1)). The message to N3 gets lost, and N1 and N2 accept value <sup>v</sup>1, which gets decided (messages P2B(ok)).

Now N3 goes through Phase 1 with round 3 (messages P1A(3)). Both N2 and N3 switch to round 3. N2 answers N3 with its stored value v<sup>1</sup> and with the round 1 at which <sup>v</sup><sup>1</sup> was accepted (message P1B(ok, <sup>v</sup><sup>1</sup>, 1)), and N3 answers itself with an undefined value, as it has never accepted any value yet (message P1B(ok, <sup>⊥</sup>, 0)). This way, if some value has been already decided upon, *any* proposer that convinces a quorum to switch to its round would receive the decided value from some of the acceptors in the quorum (recall that two quorums have a non-empty intersection). That is, N3 picks the v<sup>1</sup> returned by N2 as the candidate value, and in Phase 2 it manages that the quorum N2 and N3 accepts <sup>v</sup><sup>1</sup> at round 3 (messages P2A(v<sup>1</sup>, 3) and P2B(ok)). N3 succeeds in making a new decision, but the decided value remains the same, and, therefore, the safety requirements of a consensus protocol are satisfied.

### **3 The Faithful Deconstruction of SD-Paxos**

We now recall the faithfull deconstruction of SD-Paxos in [2,3], which we take as the reference architecture for the implementations that we aim to verify. We later show how each module of the deconstruction can be verified separately.

The deconstruction is depicted on the left of Fig. 2, which consists of modules *Paxos*, *Round-Based Consensus* and *Round-Based Register*. These modules correspond to the ones in Fig. 4 of [2], with the exception of *Weak Leader Election*. We assume that a correct process that is trusted by every other correct process always exists, and omit the details of the leader election. Leaders take the role of proposers and invoke the interface of *Paxos*. Each module uses the interface provided by the module below it.

```
1 read(int k) {
2 int j; val v; int kW; val maxV;
3 int maxKW; set of int Q; msg m;
4 for (j := 1, j <= n, j++)
5 { send(j, [RE, k]); }
6 maxKW := 0; maxV := undef; Q := {};
7 do { (j, m) := receive();
8 switch (m) {
9 case [ackRE, @k, v, kW]:
10 Q := Q ∪ {j};
11 if (kW >= maxKW)
12 { maxKW := kW; maxV := v; }
13 case [nackRE, @k]:
14 return (false, _);
15 } if (|Q| = (n+1)/2)
16 { return (true, maxV); } }
17 while (true); }
                                  18 write(int k, val vW) {
                                  19 int j; set of int Q; msg m;
                                  20 for (j := 1, j <= n, j++)
                                  21 { send(j, [WR, k, vW]); }
                                  22 Q := {};
                                  23 do { (j, m) := receive();
                                  24 switch (m) {
                                  25 case [ackWR, @k]:
                                  26 Q := Q ∪ {j};
                                  27 case [nackWR, @k]:
                                  28 return false;
                                  29 } if (|Q| = (n+1)/2)
                                  30 { return true; } }
                                  31 while (true); }
```
**Fig. 3.** Implementation of *Round-Based Register* (read and write).

The entry module *Paxos* implements SD-Paxos. Its specification (right of Fig. 2) keeps a variable vP that stores the decided value (initially undefined) and provides the operation proposeP that takes a proposed value v0 and returns vP if some value was already decided, or otherwise it returns v0. The code of the operation runs *atomically*, which we emphasise via angle brackets .... We define this specification so it meets the safety requirements of a consensus, therefore, any implementation whose entry point refines this specification will have to meet the same safety requirements.

In this work we present both specifications and implementations in pseudocode for an imperative WHILE-like language with basic arithmetic and primitive types, where val is some user-defined type for the values decided by Paxos, and undef is a literal that denotes an undefined value. The pseudo-code is selfexplanatory and we restraint ourselves from giving formal semantics to it, which could be done in standard fashion if so wished [30]. At any rate, the pseudo-code is ultimately a vehicle for illustration and we stick to this informal presentation.

The implementation of the modules is depicted in Figs. 3, 4 and 5. We describe the modules following a bottom-up approach, which better fits the purpose of conveying the connection between the deconstruction and SD-Paxos. We start with module *Round-Based Register*, which offers operations read and write (Fig. 3) and implements the replicated processes that adopt the role of acceptors (Fig. 4). We adapt the wait-free, crash-stop implementation of *Round-Based Register* in Fig. 5 of [2] by adding loops for the explicit reception of each individual message and by counting acknowledgement messages one by one. Processes are identified by integers from 1 to n, where n is the number of processes in the system. Proposers and acceptors exchange read and write requests, and their corresponding acknowledgements and non/acknowledgements. We assume a type msg for messages and let the message vocabulary to be as follows.

```
1 process Acceptor(int j) {
2 val v := undef; int r := 0; int w := 0;
3 start() {
4 int i; msg m; int k;
5 do { (i, m) := receive();
6 switch (m) {
7 case [RE, k]:
8 if (k < r) { send(i, [nackRE, k]); }
9 else { -
                   r := k; send(i, [ackRE, k, v, w]);  }
10 case [WR, k, vW]:
11 if (k < r) { send(i, [nackWR, k]); }
12 else { -
                   r := k; w := k; v := vW; send(i, [ackWR, k]);  }
13 } }
14 while (true); } }
```
**Fig. 4.** Implementation of *Round-Based Register* (acceptor).

Read requests [RE, k] carry the proposer's round k. Write requests [WR, k, v] carry the proposer's round k and the proposed value v. Read acknowledgements [ackRE, k, v, k'] carry the proposer's round k, the acceptor's value v, and the round k' at which v was accepted. Read non-acknowledgements [nackRE, k] carry the proposer's round k, and so do carry write acknowledgements [ackWR, k] and write non/acknowledgements [nackWR, K].

In the pseudo-code, we use \_ for a wildcard that could take any literal value. In the pattern-matching primitives, the literals specify the pattern against which an expression is being matched, and operator @ turns a variable into a literal with the variable's value. Compare the case [ackRE, @k, v, kW]: in Fig. 3, where the value of k specifies the pattern and v and kW get some values assigned, with the case [RE, k]: in Fig. 4, where k gets some value assigned.

We assume the network ensures that messages are neither created, modified, deleted, nor duplicated, and that they are always delivered but with an arbitrarily large transmission delay.<sup>2</sup> Primitive send takes the destination j and the message m, and its effect is to send m from the current process to the process j. Primitive receive takes no arguments, and its effect is to receive at the current process a message m from origin i, after which it delivers the pair (i, m) of identifier and message. We assume that send is non-blocking and that receive blocks and suspends the process until a message is available, in which case the process awakens and resumes execution.

Each acceptor (Fig. 4) keeps a value v, a current round r (called the *read round*), and the round w at which the acceptor's value was last accepted (called the *write round*). Initially, v is undef and both r and w are 0.

Phase 1 of SD-Paxos is implemented by operation read on the left of Fig. 3. When a proposer issues a read, the operation requests each acceptor's promise to only accept values at a round greater or equal than k by sending [RE, k]

<sup>2</sup> We allow creation and duplication of [RE, k] messages in Sect. 5, where we obtain Multi-Paxos from SD-Paxos by a series of transformations of the network semantics.


**Fig. 5.** Implementation of *Round-Based Consensus* (left) and *Paxos* (right)

(lines 4–5). When an acceptor receives a [RE, k] (lines 5–7 of Fig. 4) it acknowledges the promise depending on its read round. If k is strictly less than r then the acceptor has already made a promise to another proposer with greater round and it sends [nackRE, k] back (line 8). Otherwise, the acceptor updates r to k and acknowledges by sending [ackRE, k, v, w] (line 9). When the proposer receives an acknowledgement (lines 8–10 of Fig. 3) it counts acknowledgements up (line 10) and calculates the greatest write round at which the acceptors acknowledging so far accepted a value, and stores this value in maxV (lines 11–12). If a majority of acceptors acknowledged, the operation succeeds and returns (true, maxV) (lines 15–16). Otherwise, if the proposer received some [nackRE, k] the operation fails, returning (false, \_) (lines 13–14).

Phase 2 of SD-Paxos is implemented by operation write on the right of Fig. 3. After having collected promises from a majority of acceptors, the proposer picks the candidate value vW and issues a write. The operation requests each acceptor to accept the candidate value by sending [WR, k, vW] (lines 20– 21). When an acceptor receives [WR, k, vW] (line 10 of Fig. 4) it accepts the value depending on its read round. If k is strictly less than r, then the acceptor never promised to accept at such round and it sends [nackWR, k] back (line 11). Otherwise, the acceptor fullfils its promise and updates both w and r to k and assigns vW to its value v, and acknowledges by sending [ackWR, k] (line 12). Finally, when the proposer receives an acknowledgement (lines 23–25 of Fig. 3) it counts acknowledgements up (line 26) and checks whether a majority of acceptors acknowledged, in which case vW is decided and the operation succeeds and returns true (lines 29–30). Otherwise, if the proposer received some [nackWR, k] the operation fails and returns false (lines 27–28).<sup>3</sup>

Next, we describe module *Round-Based Consensus* on the left of Fig. 5. The module offers an operation proposeRC that takes a round k and a proposed value v0, and returns a pair (res, v) of Boolean and value, where res informs of the success of the operation and v is the decided value in case res is true. We have taken the implementation from Fig. 6 in [2] but adapted to our pseudocode conventions. *Round-Based Consensus* carries out Phase 1 and Phase 2 of

<sup>3</sup> For the implementation to be correct with our shared-memory-concurrency approach, the update of the data in acceptors must happen atomically with the sending of acknowledgements in lines 9 and 12 of Fig. 4.

**Fig. 6.** Two histories in which a failing write contaminates some acceptor.

SD-Paxos as explained in Sect. 2. The operation proposeRC calls read (line 3) and if it succeeds then chooses a candidate value between the proposed value v0 or the value v returned by read (line 5). Then, the operation calls write with the candidate value and returns (true, v) if write succeeds, or fails and returns (false, \_) (line 8) if either the read or the write fails.

Finally, the entry module *Paxos* on the right of Fig. 5 offers an operation proposeP that takes a proposed value v0 and returns the decided value. We assume that the system primitive pid() returns the process identifier of the current process. We have come up with this straightforward implementation of operation proposeP, which calls proposeRC with increasing round until the call succeeds, starting at a round equal to the process identifier pid() and increasing it by the number of processes n in each iteration. This guarantees that the round used in each invocation to proposeRC is unique.

**The Challenge of Verifying the Deconstruction of Paxos.** Verifying each module of the deconstruction separately is cumbersome because of the distributed character of the algorithm and the nature of a linearisation proof. A process may not be aware of the information that will flow from itself to other processes, but this future information flow may dictate whether some operation has to be linearised at the present. Figure 6 illustrates this challenge.

Let N1, N2 and N3 adopt both the roles of acceptors and proposers, which propose values v1, v<sup>2</sup> and v<sup>3</sup> with rounds 1, 2 and 3 respectively. Consider the history on the top of the figure. N2 issues a read with round 2 and gets acknowledgements from all but one acceptors in a quorum. (Let us call this one acceptor A.) None of these acceptors have accepted anything yet and they all return ⊥ as the last accepted value at round 0. In parallel, N3 issues a read with round 3 (third line in the figure) and gets acknowledgements from a quorum in which A does not occur. This read succeeds as well and returns (true, undef).

```
1 (bool × val) ptp[1..n] := undef;
2 val abs_vP := undef; single bool abs_resP[1..n] := undef;
3 proposeP(val v0) {
4 int k; bool res; val v; assume(!(v0 = undef));
5 k := pid(); ptp[pid()] := (true, v0);
6 do { -
          (res, v) := proposeRC(k, v0);
7 if (res) {
8 for (i := 1, i <= n, i++) {
9 if (ptp[i] = (true, v)) { lin(i); ptp[i] := (false, v); } }
10 if (!(v = v0)) { lin(pid()); ptp[pid()] := (false, v0); } } 
11 k := k + n; }
12 while (!res); return v; }
```
**Fig. 7.** Instrumented implementation of *Paxos*.

Then N3 issues a write with round 3 and value v3. Again, it gets acknowledgements from a quorum in which A does not occur, and the write succeeds deciding value <sup>v</sup><sup>3</sup> and returns true. Later on, and in real time order with the write by N3 but in parallel with the read by N2, node N1 issues a write with round 1 and value v<sup>1</sup> (first line in the figure). This write is to fail because the value v<sup>3</sup> was already decided with round 3. However, the write manages to "contaminate" acceptor A with value v1, which now acknowledges N2 and sends v<sup>1</sup> as its last accepted value at round 1. Now N2 has gotten acknowledgements from a quorum, and since the other acceptors in the quorum returned 0 as the round of their last accepted value, the read will catch value v<sup>1</sup> accepted at round 1, and the operation succeeds and returns (true, <sup>v</sup><sup>1</sup>). This history linearises by moving N2's read after N1's write, and by respecting the real time order for the rest of the operations. (The linearisation ought to respect the information flow order between N1 and N2 as well, *i.e.*, N1 contaminates A with value v1, which is read by N2.)

In the figure, a segment ending in an × indicates that the operation fails. The value returned by a successful read operation is depicted below the end of the segment. The linearisation points are depicted with a thick vertical line, and the dashed arrow indicates that two operations are in the information flow order.

The variation of this scenario on the bottom of Fig. 6 is also possible, where N1's write and N2's read happen concurrently, but where N2's read is shifted backwards to happen before in real time order with N3's read and write. Since N1's write happens before N2's read in the information flow order, then N1's write has to inexorably linearise before N3's operations, which are the ones that will "steal" N1's valid round.

These examples give us three important hints for designing the specifications of the modules. First, after a decision is committed it is *not enough* to store only the decided value, since a posterior write may contaminate some acceptor with a value different from the decided one. Second, a read operation *may succeed* with some round even if by that time other operation has already succeeded with a higher round. And third, a write with a valid round *may fail* if its round will be "stolen" by a concurrent operation. The non-deterministic specifications that we introduce next allow one to model execution histories as the ones in Fig. 6.

### **4 Modularly Verifying SD-Paxos**

In this section, we provide non-deterministic specifications for *Round-Based Consensus* and *Round-Based Register* and show that each implementation refines its specification [9]. To do so, we instrument the implementations of all the modules with *linearisation-point* annotations and use Rely/Guarantee reasoning [26].

This time we follow a top-down order and start with the entry module *Paxos*.

**Module** *Paxos*. In order to prove that the implementation on the right of Fig. 5 refines its specification on the right of Fig. 2, we introduce the instrumented implementation in Fig. 7, which uses the helping mechanism for external linearisation points of [18]. We assume that each proposer invokes proposeP with a unique proposed value. The auxiliary pending thread pool ptp[n] is an array of pairs of Booleans and values of length n, where n is the number of processes in the system. A cell ptp[i] containing a pair (true, <sup>v</sup>) signals that the process <sup>i</sup> proposed value <sup>v</sup> and the invocation proposeP(v) by process <sup>i</sup> awaits to be linearised. Once this invocation is linearised, the cell ptp[i] is updated to the pair (false, <sup>v</sup>). A cell ptp[i] containing undef signals that the process <sup>i</sup> never proposed any value yet. The array abs\_resP[n] of Boolean single-assignment variables stores the abstract result of each proposer's invocation. A linearisationpoint annotation lin(i) takes a process identifier <sup>i</sup> and performs atomically the abstract operation invoked by proposer <sup>i</sup> and assigns its result to abs\_resP[i]. The abstract state is modelled by variable abs\_vP, which corresponds to variable vP in the specification on the right of Fig. 2. One invocation of proposeP may help linearise other invocations as follows. The linearisation point is together with the invocation to proposeRC (line 6). If proposeRC committed with some value v, the instrumented implementation traverses ptp and linearises all the proposers which were proposing value v (the proposer may linearise itself in this traversal) (lines 8–9). Then, the current proposer linearises itself if its proposed value v0 is different from v (line 10), and the operation returns v (line 12). All the annotations and code in lines 6–10 are executed inside an atomic block, together with the invocation to proposeRC(k, v0).

**Theorem 1.** *The implementation of Paxos on the right of Fig. 5 linearises with respect to its specification on the right of Fig. 2.*

**Module** *Round-Based Consensus*. The top of Fig. 8 shows the nondeterministic module's specification. Global variable vRC is the decided value, initially undef. Global variable roundRC is the highest round at which some value was decided, initially 0; a global set of values valsRC (initially empty) contains values that may have been proposed by proposers. The specification is non-deterministic in that local value vD and Boolean b are unspecified, which we model by assigning random values to them. We assume that the current process identifier is ((k−1)modn)+ 1, which is consistent with how rounds are assigned to each process and incremented in the code of proposeP on the right of Fig. 5. If the unspecified value vD is neither in the set valsRC nor equal to v0 then the operation returns (false, \_) (line 11). This models that the operation fails

```
1 val vRC := undef; int roundRC := 0; set of val valsRC := {};
2 proposeRC(int k, val v0) {
3 -
      val vD := random(); bool b := random();
4 assume(!(v0 = undef)); assume(pid() = ((k - 1) mod n) + 1);
5 if (vD ∈ (valsRC ∪ {v0})) {
6 valsRC := valsRC ∪ {vD};
7 if (b && (k >= roundRC)) { roundRC := k;
8 if (vRC = undef) { vRC := vD; }
9 return (true, vRC); }
10 else { return (false, _); } }
11 else { return (false, _); }  }
1 val abs_vRC := undef; int abs_roundRC := 0;
2 set of val abs_valsRC := {};
3 proposeRC(int k, val v0) {
4 single (bool × val) abs_resRC := undef; bool res; val v;
5 assume(!(v0 = undef)); assume(pid() = ((k - 1) mod n) + 1);
6 -
      (res, v) := read(k); if (res = false) { linRC(undef, _); } 
7 if (res) { if (v = undef) { v := v0; }
8 -
               res := write(k, v); if (res) { linRC(v, true); }
9 else { linRC(v, false); } 
10 if (res) { return (true, v); } }
11 return (false, _); }
```
**Fig. 8.** Specification (top) and instrumented implementation (bottom) of *Round-Based Consensus*.

without contaminating any acceptor. Otherwise, the operation may contaminate some acceptor and the value vD is added to the set valsRC (line 6). Now, if the unspecified Boolean b is false, then the operation returns (false, \_) (lines 7 and 10), which models that the round will be stolen by a posterior operation. Finally, the operation succeeds if k is greater or equal than roundRC (line 7), and roundRC and vRC are updated and the operation returns (true, vRC) (lines 7–9).

In order to prove that the implementation in Fig. 5 linearises with respect to the specification on the top of Fig. 8, we use the instrumented implementation on the bottom of the same figure, where the abstract state is modelled by variables abs\_vRC, abs\_roundRC and abs\_valsRC in lines 1–2, the local singleassignment variable abs\_resRC stores the result of the abstract operation, and the linearisation-point annotations linRC(vD, b) take a value and a Boolean parameters and invoke the non-deterministic abstract operation and disambiguate it by assigning the parameters to the unspecified vD and b of the specification. There are two linearisation points together with the invocations of read (line 6) and write (line 8). If read fails, then we linearise forcing the unspecified vD to be undef (line 6), which ensures that the abstract operation fails without adding any value to abs\_valsRC nor updating the round abs\_roundRC. Otherwise, if write succeeds with value v, then we linearise forcing the unspecified value vD and Boolean b to be v and true respectively (line 8). This ensures that

```
1 read(int k) {
2 -
      val vD := random();
3 bool b := random(); val v;
4 assume(vD ∈ valsRR);
5 assume(pid() =
6 ((k - 1) mod n) + 1);
7 if (b) {
8 if (k >= roundRR) {
9 roundRR := k;
10 if (!(vRR = undef)) {
11 v := vRR; }
12 else { v := vD; } }
13 else { v := vD; }
14 return (true, v); }
15 else { return (false, _); }  }
                                 16 val vRR := undef;
                                 17 int roundRR := 0;
                                 18 set of val valsRR := {undef};
                                 19
                                 20 write(int k, val vW) {
                                 21 -
                                       bool b := random();
                                 22 assume(!(vW = undef));
                                 23 assume(pid() =
                                 24 ((k - 1) mod n) + 1);
                                 25 valsRR := valsRR ∪ {vW};
                                 26 if (b && (k >= roundRR)) {
                                 27 roundRR := k;
                                 28 vRR := vW;
                                 29 return true; }
                                 30 else { return false; }  }
```
**Fig. 9.** Specification of *Round-Based Register*.

the abstract operation succeeds and updates the round abs\_roundRC to k and assigns v to the decided value abs\_vRC. If write fails then we linearise forcing the unspecified vD and b to be v and false respectively (line 9). This ensures that the abstract operation fails.

**Theorem 2.** *The implementation of Round-Based Consensus in Fig. 5 linearises with respect to its specification on the top of Fig. 8.*

**Module** *Round-Based Register*. Figure 9 shows the module's nondeterministic specification. Global variable vRR represents the decided value, initially undef. Global variable roundRR represents the current round, initially 0, and global set of values valsRR, initially containing undef, stores values that may have been proposed by some proposer. The specification is non-deterministic in that method read has unspecified local Boolean b and local value vD (we assume that vD is valsRR), and method write has unspecified local Boolean b. We assume the current process identifier is ((k <sup>−</sup> 1) mod <sup>n</sup>) + 1.

Let us explain the specification of the read operation. The operation can succeed regardless of the proposer's round k, depending on the value of the unspecified Boolean b. If b is true and the proposer's round k is valid (line 8), then the read round is updated to k (line 9) and the operation returns (true, v) (line 14), where v is the read value, which coincides with the decided value if some decision was committed already or with vD otherwise. Now to the specification of operation write. The value vW is always added to the set valsRR (line 25). If the unspecified Boolean b is false (the round will be stolen by a posterior operation) or if the round k is non-valid, then the operation returns false (lines 26 and 30). Otherwise, the current round is updated to k, and the decided value vRR is updated to vW and the operation returns true (lines 27–29).

In order to prove that the implementation in Figs. 3 and 4 linearises with respect to the specification in Fig. 9, we use the instrumented implementation in Figs. 10 and 11, which uses prophecy variables [1,26] that "guess" whether the execution of the method will reach a particular program location or not. The instrumented implementation also uses external linearisation points. In particular, the code of the acceptors may help to linearise some of the invocations to read and write, based on the prophecies and on auxiliary variables that count the number of acknowledgements sent by acceptors after each invocation of a read or a write. The next paragraphs elaborate on our use of prophecy variables and on our helping mechanism.

Variables abs\_vRR, abs\_roundRR and abs\_valsRR in Fig. <sup>10</sup> model the abstract state. They are initially set to undef, 0 and the set containing undef respectively. Variable abs\_res\_r[k] is an infinite array of single-assignment pairs of Boolean and value that model the abstract results of the invocations to read. (Think of an infinite array as a map from integers to some type; we use the array notation for convenience.) Similarly, variable abs\_res\_w[k] is an infinite array of single-assignment Booleans that models the abstract results of the invocations to write. All the cells in both arrays are initially undef (*e.g.* the initial maps are empty). Variables count\_r[k] and count\_w[k] are infinite arrays of integers that model the number of acknowledgements sent (but not necessarily received yet) from acceptors in response to respectively read or write requests. All cells in both arrays are initially 0. The variable proph\_r[k] is an infinite array of single-assignment pairs bool <sup>×</sup> val, modelling the prophecy for the invocations of read, and variable proph\_w[k] is an infinite array of singleassignment Booleans modelling the prophecy for the invocations of write.

The linearisation-point annotations linRE(k, vD, b) for read take the proposer's round k, a value vD and a Boolean b, and they invoke the abstract operation and disambiguate it by assigning the parameters to the unspecified vD and b of the specification on the left of Fig. 9. At the beginning of a read(k) (lines 11–14 of Fig. 10), the prophecy proph\_r[k] is set to (true, <sup>v</sup>) if the invocation reaches PL: RE\_SUCC in line 26. The <sup>v</sup> is defined to coincide with maxV at the time when that location is reached. That is, v is the value accepted at the greatest round by the acceptors acknowledging so far, or undefined if no acceptor ever accepted any value. If the operation reaches PL: RE\_FAIL in line 24 instead, the prophecy is set to (false, \_). (If the method never returns, the prophecy is left undef since it will never linearise.) A successful read(k) linearises in the code of the acceptor in Fig. 11, when the (n + 1)/2th acceptor sends [ackRE, k, v, w], and only if the prophecy is (true, <sup>v</sup>) and the operation was not linearised before (lines 10–14). We force the unspecified vD and b to be <sup>v</sup> and true respectively, which ensures that the abstract operation succeeds and returns (true, <sup>v</sup>). A failing read(k) linearises at the return in the code of read (lines 23–24 of Fig. 10), after the reception of [nackRE, k] from one acceptor. We force the unspecified vD and b to be undef and false respectively, which ensures that the abstract operation fails.

The linearisation-point annotations linWR(k, vW, b) for write take the proposer's round k and value vW, and a Boolean b, and they invoke the abstract operation and disambiguate it by assigning the parameter to the unspecified b

```
1 val abs_vRR := undef; int abs_roundRR := 0;
2 set of val abs_valsRR := {undef};
3 single val abs_res_r[1..∞] := undef;
4 single val abs_res_w[1..∞] := undef;
5 int count_r[1..∞] := 0; int count_w[1..∞] := 0;
6 single (bool × val) proph_r[1..∞] := undef;
7 single bool proph_w[i..∞] := undef;
8 read(int k) {
9 int j; val v; set of int Q; int maxKW; val maxV; msg m;
10 assume(pid() = ((k - 1) mod n) + 1);
11 -
      if (operation reaches PL: RE_SUCC and define v = maxV at that time) {
12 proph_r[k] := (true, v); }
13 else { if (operation reaches PL: RE_FAIL) {
14 proph_r[k] := (false, _); } } 
15 for (j := 1, j <= n, j++) { send(j, [RE, k]); }
16 maxKW := 0; maxV := undef; Q := {};
17 do { (j, m) := receive();
18 switch (m) {
19 case [ackRE, @k, v, kW]:
20 Q := Q ∪ {j};
21 if (kW >= maxKW) { maxKW := kW; maxV := v; }
22 case [nackRE, @k]:
23 -
             linRE(k, undef, false); proph_r[k] := undef;
24 return (false, _);  // PL: RE_FAIL
25 } if (|Q| = (n+1)/2) {
26 return (true, maxV); } } // PL: RE_SUCC
27 while (true); }
28 write(int k, val vW) {
29 int j; set of int Q; msg m;
30 assume(!(vW = undef)); assume(pid() = ((k - 1) mod n) + 1);
31 -
      if (operation reaches PL: WR_SUCC) { proph_w[k] := true; }
32 else { if (operation reaches PL: WR_FAIL) {
33 proph_w[k] := false; } } 
34 for (j := 1, j <= n, j++) { send(j, [WR, k, vW]); }
35 Q := {};
36 do { (j, m) := receive();
37 switch (m) {
38 case [ackWR, @k]:
39 Q := Q ∪ {j};
40 case [nackWR, @k]:
41 -
             if (count_w[k] = 0) {
42 linWR(k, vW, false); proph_w[k] := undef; }
43 return false;  // PL: WR_FAIL
44 } if (|Q| = (n+1)/2) {
45 return true; } } // PL: WR_SUCC
46 while (true); }
```
**Fig. 10.** Instrumented implementation of read and write methods.

```
1 process Acceptor(int j) {
2 val v := undef; int r := 0; int w := 0;
3 start() {
4 int i; msg m; int k;
5 do { (i, m) := receive();
6 switch (m) {
7 case [RE, k]:
8 if (k < r) { send(i, [nackRE, k]); }
9 else { -
                r := k;
10 if (abs_res_r[k] = undef) {
11 if (proph_r[k] = (true, v)) {
12 if (count_r[k] = (n+1)/2 - 1) {
13 linRE(k, v, true); } } }
14 count_r[k]++; send(i, [ackRE, k, v, w]);  }
15 case [WR, k, vW]:
16 if (k < r) { send(j, i, [nackWR, k]); }
17 else { -
                r := k; w := k; v := vW;
18 if (abs_res_w[k] = undef) {
19 if (!(proph_w[k] = undef)) {
20 if (proph_w[k]) {
21 if (count_w[k] = (n+1)/2 - 1) {
22 linWR(k, vW, true); } }
23 else { linWR(k, vW, false); } } }
24 count_w[k]++; send(j, i, [ackWR, k]);  }
25 } }
26 while (true); } }
```
**Fig. 11.** Instrumented implementation of acceptor processes.

of the specification on the right of Fig. 9. At the beginning of a write(k, vW) (lines 31–33 of Fig. 10), the prophecy proph\_r[k] is set to true if the invocation reaches PL: WR\_SUCC in line 45, or to false if it reaches PL: WR\_FAIL in line 43 (or it is left undef if the method never returns). A successfully write(k, vW) linearises in the code of the acceptor in Fig. 11, when the (n+ 1)/2th acceptor sends [ackWR, k], and only if the prophecy is true and the operation was not linearised before (lines 17–24). We force the unspecified b to be true, which ensures that the abstract operation succeeds deciding value vW and updates roundRR to k. A failing write(k, vW) may linearise either at the return in its own code (lines 41–43 of Fig. 10) if the proposer received one [nackWR, k] and no acceptor sent any [ackWR, k] yet, or at the code of the acceptor, when the first acceptor sends [ackWR, k], and only if the prophecy is false and the operation was not linearised before. In both cases, we force the unspecified b to be false, which ensures that the abstract operation fails.

**Theorem 3.** *The implementation of Round-Based Register in Figs. 10 and 11 linearises with respect to its specification in Fig. 9.*

### **5 Multi-Paxos via Network Transformations**

We now turn to more complicated distributed protocols that build upon the idea of Paxos consensus. Our ultimate goal is to reuse the verification result from the Sects. 3 and 4, as well as the high-level round-based register interface. In this section, we will demonstrate how to reason about an implementation of Multi-Paxos as of an array of *independent* instances of the *Paxos* module defined previously, despite the subtle dependencies between its sub-components, as present in Multi-Paxos's "canonical" implementations [5,15,27]. While an abstraction of Multi-Paxos to an array of independent shared "single-shot" registers is almost folklore, what appears to be inherently difficult is to verify a Multi-Paxos-based consensus (*wrt.* to the array-based abstraction) by means of *reusing* the proof of a SD-Paxos. All proofs of Multi-Paxos we are aware of are, thus, *non-modular* with respect to underlying SD-Paxos instances [5,22,24], *i.e.*, they require one to redesign the invariants of the *entire* consensus protocol.

This proof modularity challenge stems from the optimised nature of a classical Multi-Paxos protocol, as well as its real-world implementations [6]. In this part of our work is to distil such protocol-aware optimisations into a separate *network semantics layer*, and show that each of them refines the semantics of a Cartesian product-based view, *i.e.*, exhibits the very same client-observable behaviours. To do so, we will establishing the refinement between the optimised implementations of Multi-Paxos and a simple Cartesian product abstraction, which will allow to extend the register-based abstraction, explored before in this paper, to what is considered to be a canonical amortised Multi-Paxos implementation.

#### **5.1 Abstract Distributed Protocols**

We start by presenting the formal definitions of encoding distributed protocols (including Paxos), their message vocabularies, protocol-based network semantics, and the notion of an observable behaviours.

**Protocols and Messages.** Figure 12 provides basic definitions of the distributed protocols and their components. Each protocol p is a tuple Δ,M, Sint, Srcv, Ssnd. Δ is a set of local states, which can be assigned to each of the participating nodes, also determining the node's role via an addi-


**Fig. 12.** States and transitions.

tional tag,<sup>4</sup> if necessary (*e.g.*, an acceptor and a proposer states in Paxos are different). M is a "message vocabulary", determining the set of messages that can be used for communication between the nodes.

<sup>4</sup> We leave out implicit the consistency laws for the state, that are protocol-specific.

**Fig. 13.** Transition rules of the simple protocol-aware network semantics

σ-

, M-

σ, M *<sup>p</sup>* ===⇒rcv -


Messages can be thought of as JavaScript-like dictionaries, pairing unique fields (isomorphic to strings) with their values. For the sake of a uniform treatment, we assume that each message m ∈ M has at least two fields, *from* and *to* that point to the source and the destination node of a message, correspondingly. In addition to that, for simplicity we will assume that each message carries a Boolean field *active*, which is set to True when the message is sent and is set to False when the message is received by its destination node. This flag is required to keep history information about messages sent in the past, which is customary in frameworks for reasoning about distributed protocols [10,23,28]. We assume that a "message soup" M is a multiset of messages (*i.e.* a set with zero or more copies of each message) and we consider that each copy of the same message in the multiset has its own "identity", and we write m = m to represent that m and m are not the same copy of a particular message.

Finally, S{int*,*rcv*,*snd} are step-relations that correspond to the internal changes in the local state of a node (Sint), as well as changes associated with sending (Ssnd) and receiving (Srcv) messages by a node, as allowed by the protocol. Specifically, Sint relates a local node state before and after the allowed internal change; Srcv relates the initial state and an incoming message m ∈ M with the resulting state; Ssnd relates the internal state, the output state and the set of atomically sent messages. For simplicity we will assume that id ⊆ Sint.

In addition, we consider Δ<sup>0</sup> ⊆ Δ—the set of the allowed *initial* states, in which the system can be present at the very beginning of its execution. The global state of the network σ ∈ Σ is a map from node identifiers (n ∈ Nodes) to local states from the set of states Δ, defined by the protocol.

**Simple Network Semantics.** The simple initial operational semantics of the network ( *<sup>p</sup>* =⇒ ⊆ (Σ×℘(M))×(Σ×℘(M))) is parametrised by a protocol p and relates the initial *configuration* (*i.e.*, the global state and the set of messages) with the resulting configuration. It is defined via as a reflexive closure of the union of three relations *<sup>p</sup>* ==<sup>⇒</sup> int <sup>∪</sup> *<sup>p</sup>* ===⇒rcv <sup>∪</sup> *<sup>p</sup>* ===<sup>⇒</sup> snd , their rules are given in Fig. 13.

The rule StepInt corresponds to a node n picked non-deterministically from the domain of a global state σ, executing an internal transition, thus changing its local state from δ to δ . The rule StepReceive non-deterministically picks a m message from a message soup M ⊆ M, changes the state using the protocol's receive-step relation p.Srcv at the corresponding host node *to*, and updates its local state accordingly in the common mapping (σ[*to* → δ ]). Finally, the rule StepSend, non-deterministically picks a node n, executes a send-step, which results in updating its local state emission of a set of messages ms, which is added to the resulting soup. In order to "bootstrap" the execution, the initial states from the set Δ<sup>0</sup> ⊆ Δ are assigned to the nodes.

We next define the observable protocol behaviours *wrt.* the simple network semantics as the prefix-closed set of all system's configuration traces.

#### **Definition 1. (Protocol behaviours)**

$$\mathcal{B}\_p = \bigcup\_{m \in \mathbb{N}} \left\{ \langle \langle \sigma\_0, M\_0 \rangle, \dots, \langle \sigma\_m, M\_m \rangle \rangle \, \middle| \, \begin{array}{l} \exists \delta\_0^{n \in N} \in \Delta\_0, \,\sigma\_0 = \uplus\_{n \in N} [n \mapsto \delta\_0^n] \wedge \\ \qquad \langle \sigma\_0, M\_0 \rangle \stackrel{p}{\Longrightarrow} \dots \stackrel{p}{\Longrightarrow} \langle \sigma\_m, M\_m \rangle \end{array} \right\}$$

That is, the set of behaviours captures all possible configurations of initial states for a fixed set of nodes N ⊆ Nodes. In this case, the set of nodes N is an implicit parameter of the definition, which we fix in the remainder of this section.

*Example 1 (Encoding SD-Paxos).* An abstract distributed protocol for SD-Paxos can be extracted from the pseudo-code of Sect. 3 by providing a suitable small-step operational semantics `a la Winskel [30]. We restraint ourselves from giving such formal semantics, but in Appendix D of the extended version of the paper we outline how the distributed protocol would be obtained from the given operational semantics and from the code in Figs. 3, 4 and 5.

#### **5.2 Out-of-Thin-Air Semantics**

We now introduce an intermediate version of a simple protocol-aware semantics that generates messages "out of thin air" according to a certain predicate P ⊆ Δ × M, which determines whether the network generates a certain message without exercising the corresponding send-transition. The rule is as follows:

$$\begin{array}{cc} \text{OTASEND} \\ \hline n \in \mathsf{dom}(\sigma) & \delta = \sigma(n) \qquad \mathcal{P}(\delta, m) & M' = M \cup \{m\} \\ \hline & \langle \sigma, M \rangle \xrightarrow[\text{data}]{p, \mathcal{P}} \langle \sigma, M' \rangle \end{array}$$

That is, a random message m can be sent at any moment in the semantics described by *<sup>p</sup>* <sup>=</sup>⇒ ∪ *p,*<sup>P</sup> ===⇒ota , given that the node <sup>n</sup>, "on behalf of which" the message is sent is in a state δ, such that P(δ, m) holds.

*Example 2.* In the context of Single-Decree Paxos, we can define P as follows:

$$\mathcal{P}(\delta, m) \triangleq m.content = \mathsf{[RE, k]} \land \delta.\mathsf{pid} = n \land \delta.\mathsf{ro1e} = Proposer \land k \le \delta.\mathsf{kP}$$

In other words, if a node n is a *Proposer* currently operating with a round δ.kP, the network semantics can always send another request "on its behalf", thus generating the message "out-of-thin-air". Importantly, the last conjunct in the definition of P is in terms of ≤, rather than equality. This means that the predicate is intentionally loose, allowing for sending even "stale" messages, with expired rounds that are smaller than what n currently holds (no harm in that!).

By definition of single-decree Paxos protocol, the following lemma holds:

**Lemma 1 (OTA refinement).** B *<sup>p</sup>* <sup>=</sup>⇒<sup>∪</sup> *p,*<sup>P</sup> ===*ota*<sup>⇒</sup> ⊆ B*p, where* <sup>p</sup> *is an instance of the module Paxos, as defined in Sect. 3 and in Example 1.*

#### **5.3 Slot-Replicating Network Semantics**

With the basic definitions at hand, we now proceed to describing alternative network behaviours that make use of a specific protocol p = Δ,M, Sint, Srcv, Ssnd, which we will consider to be fixed for the remainder of this section, so we will be at times referring to its components (*e.g.*, Sint, Srcv, *etc*.) without a qualifier.

$$\begin{array}{c} \text{SRSTEPINT} \\ i \in I \qquad n \in \mathsf{dom}(\sigma) \\ \delta = \sigma(n)[i] \qquad \langle \delta, \delta' \rangle \in p. \mathsf{S}\_{\mathsf{int}} \\ \hline \sigma(\sigma, M) \xrightarrow[\mathsf{int}] \quad \overline{\mathsf{int}} \quad \langle \sigma', M \rangle \\ \hline \end{array} \qquad \begin{array}{c} \mathsf{SRSTEPESED} \\ i \in I \qquad n \in \mathsf{dom}(\sigma) \\ \hline \end{array}$$

SRStepReceive m ∈ M m.*active* m.*to* ∈ dom(σ) δ = σ(m.*to*)[m.*slot*] δ, m, δ- ∈ p.Srcv m- = m[*active* → False] σ- = σ(n)[m.*slot* → δ- ] M- = M \ {m} ∪ - m- σ, M <sup>×</sup> ===⇒rcv σ- , M- 

**Fig. 14.** Transition rules of the slot-replicating network semantics.

Figure 14 describes a semantics of a *slot-replicating* (SR) network that exercises multiple copies of the *same* protocol instance p*<sup>i</sup>* for i ∈ I, some, possibly infinite, set of indices, to which we will be also referring as *slots*. Multiple copies of the protocol are incorporated by enhancing the messages from p's vocabulary M with the corresponding indices, and implementing the on-site dispatch of the indexed messages to corresponding protocol instances at each node. The local protocol state of each node is, thus, no longer a single element being updated, but rather an *array*, mapping i ∈ I into δ*i*—the corresponding local state component. The small-step relation for SR semantics is denoted by <sup>×</sup> =⇒. The rule SRStepInt is similar to StepInt of the simple semantics, with the difference that it picks not only a node but also an index i, thus referring to a specific component σ(n)[i] as δ and updating it correspondingly (σ(n)[i] → δ ). For the remaining transitions, we postulate that the messages from p's vocabulary p.M are enhanced to have a dedicated field *slot*, which indicates a protocol copy at a node, to which the message is directed. The receive-rule SRStepReceive is similar to StepReceive but takes into the account the value of m.*slot* in the received message m, thus redirecting it to the corresponding protocol instance and updating the local state appropriately. Finally, the rule SRStepSend can be now executed for any slot i ∈ I, reusing most of the logic of the initial protocol and otherwise mimicking its simple network semantic counterpart StepSend.

Importantly, in this semantics, for two different slots i, j, such that i = j, the corresponding "projections" of the state behave *independently* from each other. Therefore, transitions and messages in the protocol instances indexed by i at different nodes *do not interfere* with those indexed by j. This observation can be stated formally. In order to do so we first defined the behaviours of slot-replicating networks and their projections as follows:

**Definition 2 (Slot-replicating protocol behaviours).**

$$\mathcal{B}\_{\times} = \bigcup\_{m \in \mathbb{N}} \left\{ \langle \langle \sigma\_0, M\_0 \rangle, \dots, \langle \sigma\_m, M\_m \rangle \rangle \, \middle| \, \begin{array}{l} \exists \delta\_0^{n \in N} \in \Delta\_0, \\ \sigma\_0 = \bigcup\_{n \in N} [n \mapsto \{i \mapsto \delta\_0^n \mid i \in I\}] \wedge \\ \langle \sigma\_0, M\_0 \rangle \stackrel{\underline{p}}{\Longrightarrow} \dots \stackrel{\underline{p}}{\Longrightarrow} \langle \sigma\_m, M\_m \rangle \end{array} \right\}$$

That is, the slot-replicated behaviours are merely behaviours with respect to networks, whose nodes hold *multiple instances* of the same protocol, indexed by slots i ∈ I. For a slot i ∈ I, we define *projection* B×|*<sup>i</sup>* as a set of global state traces, where each node's local states is restricted only to its ith component. The following simulation lemma holds naturally, connecting the state-replicating network semantics and simple network semantics.

### **Lemma 2 (Slot-replicating simulation).** *For all* I, i ∈ I*,* B×|*<sup>i</sup>* = B*p.*

*Example 3 (Slot-replicating semantics and Paxos).* Given our representation of Paxos using roles (acceptors/proposers) encoded via the corresponding parts of the local state δ, we can construct a "na¨ıve" version of Multi-Paxos by using the SR semantics for the protocol. In such, every slot will correspond to a SD-Paxos instance, not interacting with any other slots. From the practical perspective, such an implementation is rather non-optimal, as it does not exploit dependencies between rounds accepted at different slots.

### **5.4 Widening Network Semantics**

We next consider a version of the SR semantics, extended with a new rule for handling received messages. In the new semantics, dubbed *widening*, a node, upon receiving a message m ∈ T, where T ⊆ p.M, for a slot i, *replicates* it for all slots from the index set I, for the very same node. The new rule is as follows:

$$\begin{array}{lcl} \text{WSTEPRECT} & \\ m \in M & m.active \\ \langle \delta, m, \delta' \rangle \in p.\mathsf{S}\_{\text{rcv}} & m' = m[active \leftrightarrow \mathsf{False}] \\ \hline \mathsf{ms} = \text{if } (m \in T) \text{ then } \begin{cases} m' \mid m' \rangle = \mathsf{pred} \mathsf{else} \end{cases} & \sigma' = \sigma(n)[m.slot \leftrightarrow \delta'] \\ \hline \mathsf{ms} = \text{if } (m \in T) \text{ then } \begin{cases} m' \mid m' = m[slot \leftrightarrow j], j \in I \end{cases} & \text{else } \emptyset \\ \hline \end{cases} \\ \langle \sigma, M \rangle \stackrel{\nabla}{=} \langle \sigma', (M \nmid \{m\}) \cup \{m' \} \cup \mathsf{ms} \rangle \end{array}$$

At first, this semantics seems rather unreasonable: it might create more messages than the system can "consume". However, it is possible to prove that, under certain conditions on the protocol p, the set of behaviours observed under this semantics (*i.e.*, with SRStepReceive replaced by WStepReceiveT) is *not larger* than B<sup>×</sup> as given by Definition 2. To state this formally we first relate the set of "triggering" messages <sup>T</sup> from WStepReceiveT to a specific predicate <sup>P</sup>.

**Definition 3 (OTA-compliant message sets).** The set of messages T ⊆ p.M is OTA-compliant with the predicate P iff for any b ∈ B*<sup>p</sup>* and σ, M ∈ b, if <sup>m</sup> <sup>∈</sup> <sup>M</sup>, then <sup>P</sup>(σ(m.from), m).

In other words, the protocol p is relaxed enough to "justify" the presence of m in the soup at *any* execution, by providing the predicate P, relating the message to the corresponding sender's state. Next, we use this definition to slot-replicating and widening semantics via the following definition.

**Definition 4 (**P**-monotone protocols).** A protocol p is P-monotone iff for any, <sup>b</sup> ∈ B×, σ, M ∈ <sup>b</sup>, <sup>m</sup>, <sup>i</sup> <sup>=</sup> m.*slot*, and <sup>j</sup> <sup>=</sup> <sup>i</sup>, if <sup>P</sup>(σ(m.from)[i], m) then we have that <sup>P</sup>(σ(m.from)[j], m), where m "removes" the *slot* field from <sup>m</sup>.

Less formally, Definition 4 ensures that in a slot-replicated product × of a protocol p, different components cannot perform "out of sync" *wrt.* P. Specifically, if a node in ith projection is related to a certain message m via P, then any other projection j of the same node will be P-related to this message, as well.

*Example 4.* This is a "non-example". A version of slot-replicated SD-Paxos, where we allow for arbitrary increments of the round *per-slot* at a same proposer node (*i.e.*, out of sync), would not be monotone *wrt.* P from Example 2. In contrast, a slot-replicated product of SD-Paxos instances with fixed rounds is monotone *wrt.* the same P.

**Lemma 3.** *If* <sup>T</sup> *from* WStepReceiveT *is OTA-compliant with predicate* <sup>P</sup>*, such that* B *<sup>p</sup>* <sup>=</sup>⇒<sup>∪</sup> *p,*<sup>P</sup> ===*ota*<sup>⇒</sup> ⊆ B *<sup>p</sup>* <sup>=</sup><sup>⇒</sup> *and* <sup>p</sup> *is* <sup>P</sup>*-monotone, then* <sup>B</sup> <sup>∇</sup> <sup>=</sup><sup>⇒</sup> ⊆ B <sup>×</sup> =⇒*.*

*Example 5 (Widening semantics and Paxos).* The SD-Paxos instance as described in Sect. 3 satisfies the refinement condition from Lemma 3. By taking <sup>T</sup> <sup>=</sup> {<sup>m</sup> <sup>|</sup> <sup>m</sup> <sup>=</sup> {*content* <sup>=</sup> [RE, k]; ...}} and using Lemma 3, we obtain the refinement between widened semantics and SR semantics of Paxos.

### **5.5 Optimised Widening Semantics**

Our next step towards a realistic implementation of Multi-Paxos out of SD-Paxos instances is enabled by an observation that in the widening semantics, the replicated messages are *always* targeting the same node, to which the initial message m ∈ T was addressed. This means that we can optimise the receive-step, making it possible to execute multiple receive-transitions of the core protocol in batch. The following rule OWStepReceiveT captures this intuition formally:

OWStepReceiveT m ∈ M m.*active* m.*to* ∈ dom(σ) σ , ms = receiveAndAct(σ, n, m) σ, M <sup>∇</sup><sup>∗</sup> ===⇒rcv σ , M \ {m}∪{m[*active* → False]} ∪ ms

where receiveAndAct(σ, n, m) - σ , ms, such that ms = *<sup>j</sup>* {m[*slot* → j] | m ∈ ms*j*} , ∀j ∈ I,δ= σ(m.*to*)[j] ∧ δ*<sup>j</sup>* , m, δ<sup>1</sup> *<sup>j</sup>* ∈ p.Srcv ∧ δ1 *<sup>j</sup>* , δ<sup>2</sup> *<sup>j</sup>* ∈ p.S<sup>∗</sup> int ∧ δ2 *<sup>j</sup>* , δ<sup>3</sup> *<sup>j</sup>* , ms*<sup>j</sup>* ∈ p.Ssnd, ∀j ∈ I,σ (m.*to*)[j] = δ<sup>3</sup> *j* .

In essence, the rule OWStepReceiveT blends several steps of the widening semantics together for a single message: (a) it first receives the message and replicates it for all slots at a destination node; (b) performs receive-steps for the message's replicas at each slot; (c) takes a number of internal steps, allowed by the protocol's Sint; and (d) takes a send-transition, eventually sending all emitted message, instrumented with the corresponding slots.

*Example 6.* Continuing Example 5, with the same parameters, the optimising semantics will execute the transitions of an acceptor, *for all slots*, triggered by receiving a single [RE, k] message for a particular slot, sending back *all* the results for all the slots, which might either agree to accept the value or reject it.

The following lemma relates the optimising and the widening semantics.

#### **Lemma 4 (Refinement for OW semantics).** *For any* b ∈ B ∇∗ =⇒ *there exists*

b ∈ B <sup>∇</sup> <sup>=</sup>⇒*, such that* <sup>b</sup> *can be obtained from* <sup>b</sup> *by replacing sequences of configurations* [σ*k*, M*k*,...,σ*k*+*m*, M*k*+*m*] *that have just a single node* n*, whose local state is affected in* σ*k*,...,σ*k*+*m, by* [σ*k*, M*k*,σ*k*+*m*, M*k*+*m*]*.*

That is, behaviours in the optimised semantics are the same as in the widening semantics, modulo some sequences of locally taken steps that are being "compressed" to just the initial and the final configurations.

#### **5.6 Bunching Semantics**

As the last step towards Multi-Paxos, we introduce the final network semantics that optimises executions according to <sup>∇</sup><sup>∗</sup> =⇒ described in previous section even further by making a simple addition to the message vocabulary of a slotreplicated SD-Paxos—*bunched messages*. A bunched message simply packages


where *bunch*(ms, n1, n2) = {*msgs* = ms; *from* = n1; *to* = n2; *active* = True} .

#### **Fig. 15.** Added rules of the Bunching Semantics

together several messages, obtained typically as a result of a "compressed" execution via the optimised semantics from Sect. 5.5. We define two new rules for packaging and "unpackaging" certain messages in Fig. 15. The two new rules can be added to enhance either of the versions of the slot-replicating semantics shown before. In essence, the only effect they have is to combine the messages resulting in the execution of the corresponding steps of an optimised widening (via BStepRecvB), and to unpackage the messages ms from a bunching message, adding them back to the soup (BStepRecvU). The following natural refinement result holds:

**Lemma 5.** *For any* b ∈ B *<sup>B</sup>* =⇒ *there exists* b ∈ B ∇∗ <sup>=</sup>⇒*, such that* <sup>b</sup> *can be obtained from* b *by replacing all bunched messages in* b *by their msgs-component.*

The rule BStepRecvU enables effective local caching of the bunched messages, so they are processed *on demand* on the recipient side (*i.e.*, by the per-slot proposers), allowing the implementation to *skip* an entire round of Phase 1.

$$\begin{array}{ccccc} \left( \stackrel{\text{B}}{\Longrightarrow} \right) & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left( \stackrel{\text{p}}{\underset{\text{data}}{\rightleftarrows}} \right) & \text{via } \text{Lm 1 refines} & \qquad \qquad \left( \stackrel{\text{p}}{\underset{\text{data}}{\rightleftarrows}} \right) \\ \text{via } \text{Lm 5 refines} & \qquad \qquad \text{sim. via } \text{Lm 2} & \qquad \qquad \qquad \text{sim. via } \text{Lm 2} \\ & \qquad \qquad \qquad \qquad \qquad \qquad \text{via } \text{Lm 4 refines} & \qquad \qquad \qquad \qquad \left( \stackrel{\text{\text{Um}}}{\underset{\text{\text{data}}}{\rightleftarrows}} \right) & \qquad \qquad \qquad \qquad \left( \stackrel{\text{\text{Um}}}{\underset{\text{\text{data}}}{\rightleftarrows}} \right) \end{array}$$

**Fig. 16.** Refinement between different network semantics.

```
1 proposeM(val^ v, val v0) {
2 -
     assume(!(v0 = undef));
3 if (*v = undef) { *v := v0; }
4 return *v;  }
                                   5 val vM[1..∞] := undef;
                                   6 getR(int s) { return &(vM[s]); }
                                   7 proposeM(getR(1), v);
                                   8 proposeM(getR(2), v);
```
**Fig. 17.** Specification of *Multi-Paxos* and interaction via a *register provider*.

#### **5.7 The Big Picture**

What exactly have we achieved by introducing the described above family of semantics? As illustrated in Fig. 16, all behaviours of the leftmost-topmost, bunching semantics, which corresponds precisely to an implementation of Multi-Paxos with an "amortised" Phase 1, can be transitively related to the corresponding behaviours in the rightmost, vanilla slot-replicated version of a simple semantics (via the correspondence from Lemma 1) by constructing the corresponding refinement mappings [1], delivered by the proofs of Lemmas 3–5.

From the perspective of Rely/Guarantee reasoning, which was employed in Sect. 4, the refinement result from Fig. 16 justifies the replacement of a semantics on the right of the diagram by one to the left of it, as all program-level assertions will remain substantiated by the corresponding system configurations, as long as they are *stable* (*i.e.*, resilient *wrt.* transitions taken by nodes different from the one being verified), which they are in our case.

### **6 Putting It All Together**

We culminate our story of faithfully deconstructing and abstracting Paxos via a round-based register, as well as recasting Multi-Paxos via a series of network transformations, by showing how to *implement* the register-based abstraction from Sect. 3 in tandem with the network semantics from Sect. 5 in order to deliver provably correct, yet efficient, implementation of Multi-Paxos.

The crux of the composition of the two results—a register-based abstraction of SD-Paxos and a family of semantics-preserving network transformations—is a convenient interface for the end client, so she could interact with a consensus instance via the proposeM method in lines 1–4 of Fig. 17, no matter with which particular slot of a Multi-Paxos implementation she is interacting. To do so, we propose to introduce a *register provider*—a service that would give a client a "reference" to the consensus object to interact with. Lines 6–7 of Fig. 17 illustrate the interaction with the service provider, where the client requests two specific slots, 1 and 2, of Multi-Paxos by invoking getR and providing a slot parameter. In both cases the client proposes the very same value v in the two instances that run the same machinery. (Notice that, except for the reference to the consensus object, proposeM is identical to the proposeP on the right of Fig. 2, which we have verified *wrt.* linearisability in Sect. 3.)

The implementation of Multi-Paxos that we have in mind resembles the one in Figs. 3, 4 and 5 of Sect. 3, but where all the global data is provided by the register provider and passed by reference. What differs in this implementation with respect to the one in Sect. 3 and is hidden from the client is the semantics of the network layer used by the bottom layer (*cf.* left part of Fig. 2) of the registerbased implementation. The Multi-Paxos instances run (without changing the register's code) over this network layer, which "overloads" the meaning of the send/receive primitives from Figs. <sup>3</sup> and <sup>4</sup> to follow the bunching network semantics, described in Sect. 5.6.

**Theorem 4.** *The implementation of Multi-Paxos that uses a register provider and bunching network semantics refines the specification in Fig. 17.*

We implemented the register/network semantics in a proof-of-concept prototype written in Scala/Akka.<sup>5</sup> We relied on the abstraction mechanisms of Scala, allowing us to implement the register logic, verified in Sect. 4, separately from the network middle-ware, which has provided a family of Semantics from Sect. 5. Together, they provide a family of provably correct, modularly verified *distributed* implementations, coming with a simple *shared memory-like* interface.

### **7 Related Work**

**Proofs of Linearisability via Rely/Guarantee.** Our work builds on the results of Boichat *et al.* [3], who were first to propose to a systematic deconstruction of Paxos into read/write operations of a *round-based register* abstraction. We extend and harness those abstractions, by intentionally introducing more non-determinism into them, which allows us to provide the first modular (*i.e.*, mutually independent) proofs of Proposer and Acceptor using Rely/Guarantee with linearisation points and prophecies. While several logics have been proposed recently to prove linearisability of concurrent implementations using Rely/Guarantee reasoning [14,18,19,26], none of them considers message-passing distributed systems or consensus protocols.

**Verification of Paxos-Family Algorithms.** Formal verification of different versions of Paxos-family protocols *wrt.* inductive invariants and liveness has been a focus of multiple verification efforts in the past fifteen years. To name just a few, Lamport has specified and verified Fast Paxos [17] using TLA+ and its accompanying model checker [32]. Chand *et al.* used TLA+ to specify and verify Multi-Paxos implementation, similar to the one we considered in this work [5]. A version of SD-Paxos has been verified by Kellomaki using the PVS theorem prover [13]. Jaskelioff and Merz have verified Disk Paxos in Isabelle/HOL [12]. More recently, Rahli *et al.* formalised an executable version of Multi-Paxos in EventML [24], a dialect of NuPRL. Dragoi *et al.* [8] implemented and verified SD-Paxos in the PSync framework, which implements a partially synchronised model [7], supporting automated proofs of system invariants. Padon *et al.* have proved the system invariants and the consensus property of both simple Paxos and Multi-Paxos using the verification tool Ivy [22,23].

Unlike all those verification efforts that consider (Multi-/Disk/Fast/...)Paxos as a *single monolithic protocol*, our approach provides the first *modular* verification of single-decree Paxos using Rely/Guarantee framework, as well as the first verification of Multi-Paxos that directly reuses the proof of SD-Paxos.

<sup>5</sup> The code is available at https://github.com/certichain/protocol-combinators.

**Compositional Reasoning about Distributed Systems.** Several recent works have partially addressed modular formal verification of distributed systems. The IronFleet framework by Hawblitzel *et al.* has been used to verify both safety and liveness of a real-world implementation of a Paxos-based replicated state machine library and a lease-based shared key-value store [10]. While the proof is structured in a modular way by composing specifications in a way similar to our decomposition in Sects. 3 and 4, that work does not address the linearisability and does not provide composition of proofs about complex protocols (*e.g.*, Multi-Paxos) from proofs about its subparts

The Verdi framework for deductive verification of distributed systems [29,31] suggests the idea of *Verified System Transformers* (VSTs), as a way to provide *vertical composition* of distributed system implementation. While Verdi's VSTs are similar in its purpose and idea to our network transformations, they *do not* exploit the properties of the protocol, which was crucial for us to verify Multi-Paxos's implementation.

The Disel framework [25,28] addresses the problem of *horizontal composition* of distributed protocols and their client applications. While we do not compose Paxos with any clients in this work, we believe its register-based specification could be directly employed for verifying applications that use Paxos as its subcomponent, which is what is demonstrated by our prototype implementation.

### **8 Conclusion and Future Work**

We have proposed and explored two complementary mechanisms for modular verification of Paxos-family consensus protocols [15]: (a) non-deterministic register-based specifications in the style of Boichat *et al.* [3], which allow one to decompose the proof of protocol's linearisability into separate independent "layers", and (b) a family of protocol-aware transformations of network semantics, making it possible to reuse the verification efforts. We believe that the applicability of these mechanisms spreads beyond reasoning about Paxos and its variants and that they can be used for verifying other consensus protocols, such as Raft [21] and PBFT [4]. We are also going to employ network transformations to verify implementations of Mencius [20], and accommodate more protocol-specific optimisations, such as implementation of master leases and epoch numbering [6].

**Acknowledgements.** We thank the ESOP 2018 reviewers for their feedback. This work by was supported by ERC Starting Grant H2020-EU 714729 and EPSRC First Grant EP/P009271/1.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **On Parallel Snapshot Isolation and Release/Acquire Consistency**

Azalea Raad1(B), Ori Lahav<sup>2</sup>, and Viktor Vafeiadis<sup>1</sup>

<sup>1</sup> MPI-SWS, Kaiserslautern, Germany {azalea,viktor}@mpi-sws.org <sup>2</sup> Tel Aviv University, Tel Aviv, Israel orilahav@tau.ac.il

**Abstract.** Parallel snapshot isolation (PSI) is a standard transactional consistency model used in databases and distributed systems. We argue that PSI is also a useful formal model for software transactional memory (STM) as it has certain advantages over other consistency models. However, the formal PSI definition is given declaratively by acyclicity axioms, which most programmers find hard to understand and reason about.

To address this, we develop a simple lock-based reference implementation for PSI built on top of the release-acquire memory model, a wellbehaved subset of the C/C++11 memory model. We prove that our implementation is sound and complete against its higher-level declarative specification.

We further consider an extension of PSI allowing transactional and non-transactional code to interact, and provide a sound and complete reference implementation for the more general setting. Supporting this interaction is necessary for adopting a transactional model in programming languages.

### **1 Introduction**

Following the widespread use of transactions in databases, *software transactional memory* (STM) [19,35] has been proposed as a programming language abstraction that can radically simplify the task of writing correct and efficient concurrent programs. It provides the illusion of blocks of code, called *transactions*, executing atomically and in isolation from any other such concurrent blocks.

In theory, STM is great for programmers as it allows them to concentrate on the high-level algorithmic steps of solving a problem and relieves them of such concerns as the low-level details of enforcing mutual exclusion. In practice, however, the situation is far from ideal as the semantics of transactions in the context of non-transactional code is not at all settled. Recent years have seen a plethora of different STM implementations [1–3,6,17,20], each providing a slightly different—and often unspecified—semantics to the programmer.

Simple models in the literature are lock-based, such as *global lock atomicity* (GLA) [28] (where a transaction must acquire a global lock prior to execution and c The Author(s) 2018

release it afterwards) and *disjoint lock atomicity* (DLA) [28] (where a transaction must acquire all locks associated with the locations it accesses prior to execution and release them afterwards), which provide *serialisable* transactions. That is, all transactions appear to have executed atomically one after another in some total order. The problem with these models is largely their implementation cost, as they impose too much synchronisation between transactions.

The database community has long recognised this performance problem and has developed weaker transactional models that do not guarantee serialisability. The most widely used such model is *snapshot isolation* (SI) [10], implemented by major databases, both centralised (e.g. Oracle and MS SQL Server) and distributed [16,30,33], as well as in STM [1,11,25,26]. In this article, we focus on a closely related model, *parallel snapshot isolation* (PSI) [36], which is known to provide better scalability and availability in large-scale geo-replicated systems. SI and PSI allow conflicting transactions to execute concurrently and to commit successfully, so long as they do not have a write-write conflict. This in effect allows reads of SI/PSI transactions to read from an earlier memory snapshot than the one affected by their writes, and permits outcomes such as the following:

$$\begin{array}{c} \text{Intilally, } x = y = 0\\ \text{T1:} \begin{bmatrix} x := 1; \\ a := y; \text{ } \emptyset \text{ reads } 0 \end{bmatrix} \text{ T2:} \begin{bmatrix} y := 1; \\ b := x; \text{ } \emptyset \text{ reads } 0 \end{bmatrix} \end{array} \tag{SB+txs}$$

The above is also known as the *write skew* anomaly in the database literature [14]. Such outcomes are analogous to those allowed by weak memory models, such as x86-TSO [29,34] and C11 [9], for non-transactional programs. In this article, we consider—to the best of our knowledge for the first time—PSI as a possible model for STM, especially in the context of a concurrent language such as C/C++ with a weak memory model. In such contexts, programmers are already familiar with weak behaviours such as that exhibited by SB+txs above.

A key reason why PSI is more suitable for a programming language than SI (or other stronger models) is *performance*. This is analogous to why C/C++ adopted non-multi-copy-atomicity (allowing two different threads to observe a write by a third thread at different times) as part of their concurrency model. Consider the following "IRIW" (independent reads of independent writes) litmus test:

$$\begin{array}{c||c} & \text{Initially, } x = y = 0 \\ \mathbf{T1:} & \left\| \begin{array}{c} \mathbf{T2:} \\ \begin{bmatrix} a := x; \text{ } \emptyset \text{ reads } 0 \\ b := y; \text{ } \emptyset \text{ reads } 0 \end{array} \right\| \\ & \left\| \begin{array}{c} a := y; \text{ } \emptyset \text{reads } 0 \end{array} \right\| \end{array} \right\| \begin{array}{c} \mathbf{T4:} \\ d := x; \text{ } \emptyset \text{reads } 0 \end{array} \left\| \begin{array}{c} \mathbf{T4:} \\ y := 1; \end{array} \right\| \end{array} \text{(IRIW+txs)}$$

In the annotated behaviour, transactions T2 and T3 disagree on the relative order of transactions T1 and T4. Under PSI, this behaviour (called the *long fork anomaly*) is allowed, as T1 and T4 are not ordered—they commit in parallel but it is disallowed under SI. This intuitively means that SI must impose ordering guarantees even on transactions that do not access a common location, and can be rather costly in the context of a weakly consistent system.

A second reason why PSI is much more suitable than SI is that it has better properties. A key intuitive property a programmer might expect of transactions is *monotonicity*. Suppose, in the (SB+txs) program we split the two transactions into four smaller ones as follows:

$$\begin{array}{l} \textbf{T1:} \left[ x := 1; \begin{array}{l} \text{Intilally, } x = y = 0 \\ \text{T2:} \left[ y := 1; \begin{array}{l} \text{T2:} \left[ y := 1; \\ \text{T4:} \left[ b := x; \text{ } \text{'} \end{array} \right] \end{array} \right] \text{:} \begin{array}{l} \text{(SB+txs+chop)} \end{array} \right] \end{array}$$

One might expect that if the annotated behaviour is allowed in (SB+txs), it should also be allowed in (SB+txs+chop). This indeed is the case for PSI, but not for SI! In fact, in the extreme case where every transaction contains a single access, SI provides serialisability. Nevertheless, PSI currently has two significant drawbacks, preventing its widespread adoption. We aim to address these here.

The first PSI drawback is that its formal semantics can be rather daunting for the uninitiated as it is defined declaratively in terms of acyclicity constraints. What is missing is perhaps a simple lock-based reference implementation of PSI, similar to the lock-based implementations of GLA and DLA, that the programmers can readily understand and reason about. As an added benefit, such an implementation can be viewed as an operational model, forming the basis for developing program logics for reasoning about PSI programs.

Although Cerone et al. [15] proved their declarative PSI specification equivalent to an implementation strategy of PSI in a distributed system with replicated storage over causal consistency, their implementation is not suitable for reasoning about *shared-memory* programs. In particular, it cannot help the programmers determine how transactional and non-transactional accesses may interact.

As our first contribution, in Sect. 4 we address this PSI drawback by providing a simple lock-based reference implementation that we prove equivalent to its declarative specification. Typically, one proves that an implementation is *sound* with respect to a declarative specification—i.e. every behaviour observable in the implementation is accounted for in the declarative specification. Here, we also want the other direction, known as *completeness*, namely that every behaviour allowed by the specification is actually possible in the implementation. Having a (simple) complete implementation is very useful for programmers, as it may be easier to understand and experiment with than the declarative specification.

Our reference implementation is built in the *release-acquire* fragment of the C/C++ memory model [8,9,21], using sequence locks [13,18,23,32] to achieve the correct transactional semantics.

The second PSI drawback is that its study so far has not accounted for the subtle effects of non-transactional accesses and how they interact with transactional accesses. While this scenario does not arise in 'closed world' systems such as databases, it is crucially important in languages such as C/C++ and Java, where one cannot afford the implementation cost of making every access transactional so that it is "strongly isolated" from other concurrent transactions.

Therefore, as our second contribution, in Sect. 5 we extend our basic reference implementation to make it robust under uninstrumented non-transactional accesses, and characterise declaratively the semantics we obtain. We call this extended model RPSI (for "robust PSI") and show that it gives reasonable semantics even under scenarios where transactional and non-transactional accesses are mixed.

*Outline.* The remainder of this article is organised as follows. In Sect. 2 we present an overview of our contributions and the necessary background information. In Sect. 3 we provide the formal model of the C11 release/acquire fragment and describe how we extend it to specify the behaviour of STM programs. In Sect. 4 we present our PSI reference implementation (without non-transactional accesses), demonstrating its soundness and completeness against the declarative PSI specification. In Sect. 5 we formulate a declarative specification for RPSI as an extension of PSI accounting for non-transactional accesses. We then present our RPSI reference implementation, demonstrating its soundness and completeness against our proposed declarative specification. We conclude and discuss future work in Sect. 6.

### **2 Background and Main Ideas**

One of the main differences between the specification of database transactions and those of STM is that STM specifications must additionally account for the interactions between *mixed-mode* (both transactional and non-transactional) accesses to the same locations. To characterise such interactions, Blundell et al. [12,27] proposed the notions of *weak* and *strong atomicity*, often referred to as weak and strong isolation. Weak isolation guarantees isolation only amongst transactions: the intermediate state of a transaction cannot affect or be affected by other transactions, but no such isolation is guaranteed with respect to nontransactional code (e.g. the accesses of a transaction may be interleaved by those of non-transactional code.). By contrast, strong isolation additionally guarantees full isolation from non-transactional code. Informally, each non-transactional access is considered as a transaction with a single access. In what follows, we explore the design choices for implementing STMs under each isolation model (Sect. 2.1), provide an intuitive account of the PSI model (Sect. 2.2), and describe the key requirements for implementing PSI and how we meet them (Sect. 2.3).

### **2.1 Implementing Software Transactional Memory**

Implementing STMs under either strong or weak isolation models comes with a number of challenges. Implementing strongly isolated STMs requires a conflict detection/avoidance mechanism between transactional and non-transactional code. That is, unless non-transactional accesses are instrumented to adhere to the same access policies, conflicts involving non-transactional code cannot be detected. For instance, in order to guarantee strong isolation under the GLA model [28] discussed earlier, non-transactional code must be modified to acquire the global lock prior to each shared access and release it afterwards.

Implementing weakly-isolated STMs requires a careful handling of aborting transactions as their intermediate state may be observed by non-transactional code. Ideally, the STM implementation must ensure that the intermediate state of aborting transactions is not leaked to non-transactional code. A transaction may abort either because it failed to commit (e.g. due to a conflict), or because it encountered an explicit abort instruction in the transactional code. In the former case, leaks to non-transactional code can be avoided by pessimistic concurrency control (e.g. locks), pre-empting conflicts. In the latter case, leaks can be prevented either by lazy version management (where transactional updates are stored locally and propagated to memory only upon committing), or by disallowing explicit abort instructions altogether – an approach taken by the (weakly isolated) relaxed transactions of the C++ memory model [6].

As mentioned earlier, our aim in this work is to build an STM with PSI guarantees in the RA fragment of C11. As such, instrumenting non-transactional accesses is not feasible and thus our STM guarantees weak isolation. For simplicity, throughout our development we make a few simplifying assumptions: (i) transactions are not nested; (ii) the transactional code is without explicit abort instructions (as with the weakly-isolated transactions of C++ [6]); and (iii) the locations accessed by a transaction can be statically determined. For the latter, of course, a static over-approximation of the locations accessed suffices for the soundness of our implementations.

### **2.2 Parallel Snapshot Isolation (PSI)**

The initial model of PSI introduced in [36] is described informally in terms of a multi-version concurrent algorithm as follows. A transaction T at a replica <sup>r</sup> proceeds by taking an initial *snapshot* S of the shared objects in r. The execution of T is then carried out locally: read operations query <sup>S</sup> and write operations similarly update <sup>S</sup>. Once the execution of T is completed, it attempts to *commit* its changes to <sup>r</sup> and it succeeds *only if* it is not *write-conflicted*. Transaction T is write-conflicted if another *committed* transaction T has written to a location in <sup>r</sup> also written to by T, since it recorded its snapshot <sup>S</sup>. If T fails the conflict check it aborts and may restart the transaction; otherwise, it commits its changes to r, at which point its changes become visible to all other transactions that take a snapshot of replica r thereafter. These committed changes are later propagated to other replicas asynchronously.

The main difference between SI and PSI is in the way the committed changes at a replica r are propagated to other sites in the system. Under the SI model, committed transactions are *globally* ordered and the changes at each replica are propagated to others in this global order. This ensures that all concurrent transactions are observed in the same order by all replicas. By contrast, PSI does not enforce a global order on committed transactions: transactional effects are propagated between replicas in *causal* order. This ensures that, if replica r<sup>1</sup> commits a message m which is later read at replica r2, and r<sup>2</sup> posts a response m , no replica can see m without having seen the original message m. However, causal propagation allows two replicas to observe concurrent events as if occurring in different orders: if r<sup>1</sup> and r<sup>2</sup> concurrently commit messages m and m , then replica r<sup>3</sup> may initially see m but not m , and r<sup>4</sup> may see m but not m. This is best illustrated by the (IRIW+txs) example in Sect. 1.

#### **2.3 Towards a Lock-Based Reference Implementation for PSI**

While the description of PSI above is suitable for understanding PSI, it is not very useful for integrating the PSI model in languages such as C, C++ or Java. From a programmer's perspective, in such languages the various threads directly access the shared memory; they do not access their own replicas, which are loosely related to the replicas of other threads. What we would therefore like is an equivalent description of PSI in terms of unreplicated accesses to shared memory and a synchronisation mechanism such as locks.

In effect, we want a definition similar in spirit to *global lock atomicity* (GLA) [28], which is arguably the simplest TM model, and models committed transactions as acquiring a global mutual exclusion lock, then accessing and updating the data in place, and finally releasing the global lock. Naturally, however, the implementation of PSI cannot be that simple.

A first observation is that PSI cannot be simply implemented over sequentially consistent (SC) shared memory.<sup>1</sup> To see this, consider the IRIW+txs program from the introduction. Although PSI allows the annotated behaviour, SC forbids it for the corresponding program without transactions. The point is that under SC, either the x := 1 or the y := 1 write first reaches memory. Suppose, without loss of generality, that x := 1 is written to memory before y := 1. Then, the possible atomic snapshots of memory are <sup>x</sup> <sup>=</sup> <sup>y</sup> = 0, <sup>x</sup> = 1 <sup>∧</sup> <sup>y</sup> = 0, and x = y = 1. In particular, the snapshot read by T3 is impossible.

To implement PSI we therefore resort to a weaker memory model. Among weak memory models, the "multi-copy-atomic" ones, such as x86-TSO [29,34], SPARC PSO [37,38] and ARMv8-Flat [31], also forbid the weak outcome of (IRIW+txs) in the same way as SC, and so are unsuitable for our purpose. We thus consider *release-acquire consistency* (RA) [8,9,21], a simple and wellbehaved non-multi-copy-atomic model. It is readily available as a subset of the C/C++11 memory model [9] with verified compilation schemes to all major architectures.

RA provides a crucial property that is relied upon in the earlier description of PSI, namely *causality*. In terms of RA, this means that if thread A observes a write w of thread B, then it also observes all the previous writes of thread B as well as any other writes B observed before performing w.

A second observation is that using a single lock to enforce mutual exclusion does not work as we need to allow transactions that access disjoint sets of locations to complete in parallel. An obvious solution is to use multiple locks—one

<sup>1</sup> *Sequential consistency* (SC) [24] is the standard model for shared memory concurrency and defines the behaviours of a multi-threaded program as those arising by executing sequentially some interleaving of the accesses of its constituent threads.

per location—as in the *disjoint lock atomicity* (DLA) model [28]. The question remaining is how to implement taking a snapshot at the beginning of a transaction.

A naive attempt is to use reader/writer locks, which allow multiple readers (taking the snapshots) to run in parallel, as long as no writer has acquired the lock. In more detail, the idea is to acquire reader locks for all locations read by a transaction, read the locations and store their values locally, and then release the reader locks. However, as we describe shortly, this approach does not work. Consider the (IRIW+txs) example in Sect. 1. For T2 to get the annotated outcome, it must release its reader lock for y before T4 acquires it. Likewise, since T3 observes y = 1, it must acquire its reader lock for y after T4 releases it. By this point, however, it is transitively after the release of the y lock by T2, and so, because of causality, it must have observed all the writes observed by T2 by that point—namely, the x := 1 write. In essence, the problem is that readerwriter locks over-synchronise. When two threads acquire the same reader lock, they synchronise, whereas two read-only transactions should never synchronise in PSI.

To resolve this problem, we use *sequence locks* [13,18,23,32]. Under the sequence locking protocol, each location x is associated with a sequence (version) number vx, initialised to zero. Each write to x increments vx before and after its update, provided that vx is even upon the first increment. Each read from x checks vx before and after reading x. If both values are the same and even, then there cannot have been any concurrent increments, and the reader must have seen a consistent value. That is, read(x) - do{v:=vx; s:=x} while(is-odd(v) || vx!=v). Under SC, sequence locks are equivalent to reader-writer locks; however, under RA, they are weaker exactly because readers do not synchronise.

**Handling Non-transactional Accesses.** Let us consider what happens if some of the data accessed by a transaction is modified concurrently by an atomic non-transactional write. Since non-transactional accesses do not acquire any locks, the snapshots taken can include values written by non-transactional accesses. The result of the snapshot then depends on the order in which the variables are read. Consider for example the following litmus test:

$$\begin{array}{l} x := 1; \\ y := 1; \end{array} \left| \begin{array}{l} \mathbf{T} \colon \begin{bmatrix} a := y \text{; } \emptyset \, reads \ 1 \\ b := x; \, \emptyset \, reads \ 0 \end{array} \right| \right.$$

In our implementation, if the transaction's snapshot reads y before x, then the annotated weak behaviour is not possible, because the underlying model (RA) disallows the weak "message passing" behaviour. If, however, x is read before y by the snapshot, then the weak behaviour is possible. In essence, this means that the PSI implementation described so far is of little use, when there are races between transactional and non-transactional code.

Another problem is the lack of *monotonicity*. A programmer might expect that wrapping some code in a transaction block will never yield additional behaviours not possible in the program without transactions. Yet, in this example, removing the T block and unwrapping its code gets rid of the annotated weak behaviour!

To get monotonicity, it seems that snapshots must read the variables in the same order they are accessed by the transactions. How can this be achieved for transactions that say read x, then y, and then x again? Or transactions that depending on some complex condition, access first x and then y or vice versa? The key to solving this conundrum is surprisingly simple: *read each variable twice*. In more detail, one takes two snapshots of the locations read by the transaction, and checks that both snapshots return the same values for each location. This ensures that every location is read both before and after every other location in the transaction, and hence all the high-level happens-before orderings in executions of the transactional program are also respected by its implementation.

There is however one caveat: since equality of values is used to determine whether the two snapshots are the same, we will miss cases where different non-transactional writes to a variable write the same value. In our formal development (see Sect. 5), we thus assume that if multiple non-transactional writes write the same value to the same location, they cannot race with the same transaction. This assumption is necessary for the soundness of our implementation and cannot be lifted without instrumenting non-transactional accesses.

### **3 The Release-Acquire Memory Model for STM**

We present the notational conventions used in the remainder of this article and proceed with the declarative model of the *release-acquire* (RA) fragment [21] of the C11 memory model [9], in which we implement our STM. In Sect. 3.1 we describe how we extend this formal model to specify the behaviour of STM programs.

*Notation.* Given a relation r on a set A, we write r ?, r <sup>+</sup> and r <sup>∗</sup> for the reflexive, transitive and reflexive-transitive closure of r, respectively. We write r <sup>−</sup><sup>1</sup> for the inverse of <sup>r</sup>; <sup>r</sup>|<sup>A</sup> for <sup>r</sup> <sup>∩</sup> <sup>A</sup><sup>2</sup>; [A] for the identity relation on <sup>A</sup>, i.e. (a, a) <sup>a</sup> <sup>∈</sup> <sup>A</sup> ; irreflexive(r) for ¬∃a. (a, a) <sup>∈</sup> <sup>r</sup>; and acyclic(r) for irreflexive(<sup>r</sup> <sup>+</sup>). Given two relations r<sup>1</sup> and r2, we write r1;r<sup>2</sup> for their (left) relational composition, i.e. (a, b) <sup>∃</sup>c. (a, c) <sup>∈</sup> <sup>r</sup><sup>1</sup> <sup>∧</sup> (c, b) <sup>∈</sup> <sup>r</sup><sup>2</sup> . Lastly, when r is a strict partial order, we write r|imm for the *immediate* edges in r: (a, b) <sup>∈</sup> <sup>r</sup> ¬∃c. (a, c) <sup>∈</sup> <sup>r</sup> <sup>∧</sup> (c, b) <sup>∈</sup> <sup>r</sup> .

The RA model is given by the fragment of the C11 memory model, where all read accesses are acquire (acq) reads, all writes are release (rel) writes, and all atomic updates (i.e. RMWs) are acquire-release (acqrel) updates. The semantics of a program under RA is defined as a set of *consistent executions*.

**Definition 1 (Executions in RA).** Assume a finite set of *locations* Loc; a finite set of *values* Val; and a finite set of *thread identifiers* TId. Let x, y, z range over locations, v over values and τ over thread identifiers. An *RA execution graph of an STM implementation*, *G*, is a tuple of the form (*E*, po,rf, mo) with its nodes given by *E* and its edges given by the po,rf and mo relations such that: • *<sup>E</sup>* <sup>⊂</sup> <sup>N</sup> is a finite set of *events*, and is accompanied with the functions tid(.) : *<sup>E</sup>* <sup>→</sup> TId and lab(.) : *<sup>E</sup>* <sup>→</sup> Label, returning the thread identifier and the label of an event, respectively. We typically use a, b, and e to range over events. The label of an event is a tuple of one of the following three forms: (i) R(x, v) for *read* events; (ii) W(x, v) for *write* events; or (iii) U(x, v, v ) for *update* events. The lab(.) function induces the functions typ(.), loc(.), valr(.) and valw(.) that respectively project the type (R, <sup>W</sup> or U), location, and read/written values of an event, where applicable. The set of *read events* is denoted by R - <sup>e</sup> <sup>∈</sup> *<sup>E</sup>* typ(e) ∈ {R, <sup>U</sup>} ; similarly, the set of *write events* is denoted by W - <sup>e</sup> <sup>∈</sup> *<sup>E</sup>* typ(e) ∈ {W, U} and the set of *update events* is denoted by U -R∩W.

We further assume that *E* always contains a set *E*<sup>0</sup> of initialisation events consisting of a write event with label W(x, 0) for every <sup>x</sup> <sup>∈</sup> Loc.


We often use "*G*." as a prefix to project the various components of *G* (e.g. *G*.*E*). Given a relation r ⊆ *E* ×*E*, we write r*loc* for r∩ (a, b) loc(a) = loc(b) . Analogously, given a set <sup>A</sup> <sup>⊆</sup> *<sup>E</sup>*, we write <sup>A</sup><sup>x</sup> for <sup>A</sup> <sup>∩</sup> <sup>a</sup> loc(a) = <sup>x</sup> . Lastly, given the rf and mo relations, we define the 'reads-before' relation rb rf−<sup>1</sup>; mo \ [E].

Executions of a given program represent traces of shared memory accesses generated by the program. We only consider "partitioned" programs of the form <sup>τ</sup>∈TId <sup>c</sup><sup>τ</sup> , where denotes parallel composition, and each c<sup>i</sup> is a sequential program. The set of executions associated with a

**Fig. 1.** An RA-consistent execution of a transactionfree variant of (IRIW+txs) in Sect. 1, with program outcome a = c = 1 and b = d = 0.

given program is then defined by induction over the structure of sequential programs. We do not define this construction formally as it depends on the syntax of the implementation programming language. Each execution of a program P has a particular program *outcome*, prescribing the final values of local variables in each thread (see example in Fig. 1).

In this initial stage, the execution outcomes are unrestricted in that there are no constraints on the rf and mo relations. These restrictions and thus the permitted outcomes of a program are determined by the set of *consistent* executions:

**Definition 2 (RA-consistency).** A program execution *G* is *RA-consistent*, written RA-consistent(*G*), if acyclic(hb*loc* ∪ mo∪rb) holds, where hb - (po∪rf)<sup>+</sup> denotes the 'RA-happens-before' relation.

Among all executions of a given program P, only the *RA-consistent* ones define the allowed outcomes of P.

### **3.1 Software Transactional Memory in RA: Specification**

Our goal in this section is to develop a declarative framework that allows us to specify the behaviour of mixed-mode STM programs under weak isolation guarantees. Whilst the behaviour of transactional code is dictated by the particular isolation model considered (e.g. PSI), the behaviour of non-transactional code and its interaction with transactions is guided by the underlying memory model. As we build our STM in the RA fragment of C11, we assume the behaviour of non-transactional code to conform to the RA memory model. More concretely, we build our specification of a program P such that (i) in the absence of transactional code, the behaviour of P is as defined by the RA model; (ii) in the absence of non-transactional code, the behaviour of P is as defined by the PSI model.

**Definition 3 (Specification Executions).** Assume a finite set of *transaction identifiers* TXId. An *execution graph of an STM specification*, <sup>Γ</sup>, is a tuple of the form (*E*, po,rf, mo, <sup>T</sup> ) where:


We write <sup>T</sup> /st for the set of equivalence classes of <sup>T</sup> induced by st; [a] st for the equivalence class that contains <sup>a</sup>; and <sup>T</sup><sup>ξ</sup> for the equivalence class of transaction <sup>ξ</sup> <sup>∈</sup> TXId: <sup>T</sup><sup>ξ</sup> - <sup>a</sup> tx(a)=<sup>ξ</sup> . We write N T for non-transactional events: N T - *<sup>E</sup>* \ T . We often use "Γ." as a prefix to project the <sup>Γ</sup> components.

*Specification Consistency.* The consistency of specification graphs is modelspecific in that it is dictated by the guarantees provided by the underlying model. In the upcoming sections, we present two consistency definitions of PSI in terms of our specification graphs that lack cycles of certain shapes. In doing so, we often write r<sup>T</sup> for lifting a relation r ⊆ *E* × *E* to transaction classes: r<sup>T</sup> st; (r \ st);st. Analogously, we write <sup>r</sup>I to restrict <sup>r</sup> to the internal events of a transaction: <sup>r</sup>∩st.

*Comparison to Dependency Graphs.* Adya et al. proposed *dependency graphs* for declarative specification of transactional consistency models [5,7]. Dependency graphs are similar to our specification graphs in that they are constructed from a set of nodes and a set of edges (relations) capturing certain dependencies. However, unlike our specification graphs, the nodes in dependency graphs denote entire transactions and not individual events. In particular, Adya et al. propose three types of dependency edges: (i) a *read dependency* edge, T1 *WR* <sup>→</sup> <sup>T</sup>2, denotes that transaction <sup>T</sup><sup>2</sup> reads a value written by <sup>T</sup>1; (ii) a *write dependency* edge T<sup>1</sup> *WW*<sup>→</sup> <sup>T</sup><sup>2</sup> denotes that <sup>T</sup><sup>2</sup> overwrites a value written by <sup>T</sup>1; and (iii) an *anti-dependency* edge T<sup>1</sup> *RW*<sup>→</sup> <sup>T</sup><sup>2</sup> denotes that <sup>T</sup><sup>2</sup> overwrites a value read by T1. Adya's formalism does not allow for *non-transactional* accesses and it thus suffices to define the dependencies of an execution as edges between transactional classes. In our specification graphs however, we account for both transactional and non-transactional accesses and thus define our relational dependencies between individual events of an execution. However, when we need to relate an entire transaction to another with relation r, we use the transactional lift (rT) defined above. In particular, Adya's dependency edges correspond to ours as follows. Informally, the *WR* corresponds to our rfT; the *WW* corresponds to our moT; and the *RW* corresponds to our rbT. Adya's dependency graphs have been used to develop declarative specifications of the PSI consistency model [14]. In Sect. 4, we revisit this model, redefine it as specification graphs in our setting, and develop a reference lock-based implementation that is sound and complete with respect to this abstract specification. The model in [14] does not account for non-transactional accesses. To remedy this, later in Sect. 5, we develop a declarative specification of PSI that allows for both transactional and non-transactional accesses. We then develop a reference lock-based implementation that is sound and complete with respect to our proposed model.

### **4 Parallel Snapshot Isolation (PSI)**

We present a declarative specification of PSI (Sect. 4.1), and develop a lockbased reference implementation of PSI in the RA fragment (Sect. 4.2). We then demonstrate that our implementation is both sound (Sect. 4.3) and complete (Sect. 4.4) with respect to the PSI specification. Note that the PSI model in this section accounts for transactional code only; that is, throughout this section we assume that Γ.*<sup>E</sup>* <sup>=</sup> Γ.<sup>T</sup> . We lift this assumption later in Sect. 5.

#### **4.1 A Declarative Specification of PSI STMs in RA**

In order to formally characterise the weak behaviour and anomalies admitted by PSI, Cerone and Gotsman [14,15] formulated a declarative PSI specification. (In fact, they provide two equivalent specifications: one using dependency graphs proposed by Adya et al. [5,7]; and the other using abstract executions.) As is standard, they characterise the set of executions admitted under PSI as graphs that lack certain cycles. We present an equivalent declarative formulation of PSI, adapted to use our notation as discussed in Sect. 3. It is straightforward to verify that our definition coincides with the dependency graph specification in [15]. As with [14,15], throughout this section, we take PSI execution graphs to be those in which *E* = T ⊆ (R∪W) \ U. That is, the PSI model handles transactional code only, consisting solely of read and write events (excluding updates).

*PSI Consistency.* A PSI execution graph <sup>Γ</sup>=(*E*, po,rf, mo, <sup>T</sup> ) is *consistent*, written psi-consistent(Γ), if the following hold:

$$\begin{array}{l} \bullet \ \mathsf{r}\_{\mathrm{I}} \cup \mathsf{m} \mathsf{o}\_{\mathrm{I}} \cup \mathsf{r} \mathsf{b}\_{\mathrm{I}} \subseteq \mathsf{p} \\ \bullet \ \mathsf{ir} \mathsf{reflex} \mathsf{v} \text{is} \mathsf{e} \ (\mathsf{r}\_{\mathrm{DOT}} \mid \mathsf{r} \mathsf{f}\_{\mathrm{T}} \mid \mathsf{m} \mathsf{c}\_{\mathrm{T}})^{+} \cdot \mathsf{r} \mathsf{h}\_{\mathrm{T}} ? \end{array} \tag{\mathsf{INT}}$$

• irreflexive((po<sup>T</sup> <sup>∪</sup> rf<sup>T</sup> <sup>∪</sup> moT)<sup>+</sup>;rb<sup>T</sup> ?) (ext)

Informally, int ensures the consistency of each transaction internally, while ext provides the synchronisation guarantees among transactions. In particular, we note that the two conditions together ensure that if two read events in the same transaction read from the same location x, and no write to x is pobetween them, then they must read from the same write (known as 'internal read consistency').

Next, we provide an alternative formulation of PSI-consistency that is closer in form to RA-consistency. This formulation is the basis of our extension in Sect. 5 with non-transactional accesses.

**Lemma 1.** *A PSI execution graph* <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ) *is consistent if and only if* acyclic(psi-hb*loc* ∪ mo ∪ rb) *holds, where* psi-hb *denotes the* 'PSI-happensbefore' *relation, defined as* psi-hb -(po <sup>∪</sup> rf <sup>∪</sup> rf<sup>T</sup> <sup>∪</sup> moT)<sup>+</sup>*.*

*Proof. The full proof is provided in the technical appendix* [4]*.*

Note that this acyclicity condition is rather close to that of RA-consistency definition presented in Sect. 3, with the sole difference being the definition of 'happens-before' relation by replacing hb with psi-hb. The relation psi-hb is a strict extension of hb with rf<sup>T</sup> ∪ moT, which captures additional synchronisation guarantees resulting from transaction orderings, as described shortly. As in RA-consistency, the po and rf are included in the 'PSI-happens-before' relation psi-hb. Additionally, the rf<sup>T</sup> and mo<sup>T</sup> also contribute to psi-hb.

Intuitively, the rf<sup>T</sup> corresponds to synchronisation due to causality between transactions. A transaction <sup>T</sup><sup>1</sup> is causally-ordered before transaction <sup>T</sup><sup>2</sup>, if <sup>T</sup><sup>1</sup> writes to <sup>x</sup> and <sup>T</sup><sup>2</sup> later (in 'happens-before' order) reads <sup>x</sup>. The inclusion of rf<sup>T</sup> ensures that <sup>T</sup><sup>2</sup> cannot read from <sup>T</sup><sup>1</sup> without observing its entire effect. This in turn ensures that transactions exhibit an atomic 'all-or-nothing' behaviour. In particular, transactions cannot mix-and-match the values they read.

```
0. for (x ∈ WS) lock vx;
1. for (x ∈ RS) {
2. a := vx;
3. if (is-odd(a) && x ∈ WS) continue;
4. if (x ∈ WS) v[x]:= a;
5. s[x]:= x; }
6. for (x ∈ RS)
7. if (¬valid(x)) goto line 1;
8. -
   T;
9. for (x ∈ WS) unlock vx;
                                         lock vx -

                                            retry: v[x]:= vx;
                                                   if (is-odd(v[x]))
                                                     goto retry;
                                                   if (!CAS(vx,v[x],v[x]+1))
                                                     goto retry;
                                         unlock vx -
                                                     vx:= v[x]+ 2
                                             valid(x) -
                                                        vx == v[x]
                                         validRPSI(x) -
                                                        vx == v[x] && x == s[x]
                                                -
                                                a:= x -
                                                        a:= s[x]
                                                -
                                                x:= a -
                                                        x:=a; s[x]:= a
                                               -
                                                S1;S2 -
                                                        -
                                                          S1;-
                                                               S2
                                           while(e) S -
                                                        while(e) S
                                                ... and so on ...
```
**Fig. 2.** PSI implementation of transaction T given RS, WS; the RPSI implementation (Sect. 5) is obtained by replacing valid on line 7 with validRPSI.

For instance, if <sup>T</sup><sup>1</sup> writes to both <sup>x</sup> and <sup>y</sup>, transaction <sup>T</sup><sup>2</sup> may not read the value of <sup>x</sup> from <sup>T</sup><sup>1</sup> but read the value of <sup>y</sup> from an earlier (in 'happens-before' order) transaction T<sup>0</sup>.

The mo<sup>T</sup> corresponds to synchronisation due to conflicts between transactions. Its inclusion enforces the write-conflict-freedom of PSI transactions. In other words, if two transactions <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup> both write to the same location <sup>x</sup> via events w<sup>1</sup> and w<sup>2</sup> such that w<sup>1</sup> mo <sup>→</sup> <sup>w</sup>2, then <sup>T</sup><sup>1</sup> must commit before <sup>T</sup><sup>2</sup>, and thus the entire effect of <sup>T</sup><sup>1</sup> must be visible to <sup>T</sup><sup>2</sup>.

### **4.2 A Lock-Based PSI Implementation in RA**

We present an operational model of PSI that is both sound and complete with respect to the declarative semantics in Sect. 4.1. To this end, in Fig. 2 we develop a pessimistic (lock-based) reference implementation of PSI using sequence locks [13,18,23,32], referred to as *version locks* in our implementation. In order to avoid taking a snapshot of the *entire* memory and thus decrease the locking overhead, we assume that a transaction T is supplied with its *read set*, RS, containing those locations that are read by T. Similarly, we assume T to be supplied with its *write set*, WS, containing the locations updated by T. 2

The implementation of T proceeds by exclusively acquiring the version locks on all locations in its write set (line 0). It then obtains a snapshot of the locations in its read set by inspecting their version locks, as described shortly, and subsequently recording their values in a thread-local array s (lines 1–7). Once a snapshot is recorded, the execution of T proceeds locally (via -T on line 8) as

<sup>2</sup> A conservative estimate of RS and WS can be obtained by simple syntactic analysis.

follows. Each read operation consults the local snapshot in s; each write operation updates the memory eagerly (in-place) and subsequently updates its local snapshot to ensure correct lookup for future reads. Once the execution of T is concluded, the version locks on the write set are released (line 9). Observe that as the writer locks are acquired pessimistically, we do not need to check for write-conflicts in the implementation.

To facilitate our locking implementation, we assume that each location x is associated with a version lock at address x+1, written vx. The value held by a version lock vx may be in one of two categories: (i) an even number, denoting that the lock is free; or (ii) an odd number, denoting that the lock is exclusively held by a writer. For a transaction to write to a location x in its write set WS, the x version lock (vx) must be acquired exclusively by calling lock vx. Each call to lock vx reads the value of vx and stores it in v[x], where v is a thread-local array. It then checks if the value read is even (vx is free) and if so it atomically increments it by 1 (with a 'compare-and-swap' operation), thus changing the value of vx to an odd number and acquiring it exclusively; otherwise it repeats this process until the version lock is successfully acquired. Conversely, each call to unlock vx updates the value of vx to v[x]+2, restoring the value of vx to an even number and thus releasing it. Note that deadlocks can be avoided by imposing an ordering on locks and ensuring their in-order acquisition by all transactions. For simplicity however, we have elided this step as we are not concerned with progress or performance issues here and our main objective is a reference implementation of PSI in RA.

Analogously, for a transaction to read from the locations in its read set RS, it must record a snapshot of their values (lines 1–7). To obtain a snapshot of location x, the transaction must ensure that x is not currently being written to by another transaction. It thus proceeds by reading the value of vx and recording it in v[x]. If vx is free (the value read is even) or x is in its write set WS, the value of x can be freely read and tentatively stored in s[x]. In the latter case, the transaction has already acquired the exclusive lock on vx and is thus safe in the knowledge that no other transaction is currently updating x. Once a *tentative* snapshot of all locations is obtained (lines 1–5), the transaction must *validate* it by ensuring that it reflects the values of the read set at a single point in time (lines 6–7). To do this, it revisits the version locks, inspecting whether their values have changed (by checking them against v) since it recorded its snapshot. If so, then an intermediate update has intervened, potentially invalidating the obtained snapshot; the transaction thus restarts the snapshot process. Otherwise, the snapshot is successfully validated and returned in s.

#### **4.3 Implementation Soundness**

The PSI implementation in Fig. 2 is *sound*: for each RA-consistent implementation graph *G*, a corresponding specification graph Γ can be constructed such that psi-consistent(Γ) holds. In what follows we state our soundness theorem and briefly describe our construction of consistent specification graphs. We refer the reader to the technical appendix [4] for the full soundness proof.

**Theorem 1 (Soundness).** *For all RA-consistent implementation graphs G of the implementation in Fig. 2, there exists a PSI-consistent specification graph* Γ *of the corresponding transactional program that has the same program outcome.*

**Constructing Consistent Specification Graphs.** Observe that given an execution of our implementation with t transactions, the trace of each transaction <sup>i</sup> ∈ {<sup>1</sup> ···t} is of the form <sup>θ</sup><sup>i</sup> <sup>=</sup> *Ls*<sup>i</sup> po → *FS*<sup>i</sup> po → *S*<sup>i</sup> po → *Ts*<sup>i</sup> po → *Us*i, where *Ls*i, *FS*i, *S*i, *Ts*<sup>i</sup> and *Us*<sup>i</sup> respectively denote the sequence of events acquiring the version locks, attempting but failing to obtain a valid snapshot, recording a valid snapshot, performing the transactional operations, and releasing the version locks. For each transactional trace θ<sup>i</sup> of our implementation, we thus construct a corresponding trace of the specification as θ <sup>i</sup> <sup>=</sup> <sup>B</sup><sup>i</sup> po → *Ts* i po <sup>→</sup> <sup>E</sup>i, where <sup>B</sup><sup>i</sup> and <sup>E</sup><sup>i</sup> denote the transaction begin and end events (lab(Bi)=B and lab(Ei)=E). When *Ts*<sup>i</sup> is of the form t<sup>1</sup> po →··· po <sup>→</sup> <sup>t</sup>n, we construct *Ts* <sup>i</sup> as <sup>t</sup> 1 po →··· po <sup>→</sup> <sup>t</sup> <sup>n</sup> with each <sup>t</sup> j defined either as t <sup>j</sup> - <sup>R</sup>(x, v) when <sup>t</sup><sup>j</sup> <sup>=</sup> <sup>R</sup>(s[x], v) (i.e. the corresponding implementation event is a read event); or as t <sup>j</sup> - W(x, v) when <sup>t</sup>j=W(x, v) po <sup>→</sup> W(s[x], v).

For each specification trace θ <sup>i</sup> we construct the 'reads-from' relation as:

$$\mathsf{RF}\_{i} \triangleq \left\{ (w, t\_{j}^{\prime}) \middle| \begin{array}{l} \left| t\_{j}^{\prime} \in Ts\_{i}^{\prime} \land \exists \mathbf{x}, v \text{ } t\_{j}^{\prime} = \mathsf{R}(\mathbf{x}, v) \land w = \mathsf{W}(\mathbf{x}, v) \\ \land (w \in Ts\_{i}^{\prime} \Rightarrow w \stackrel{\mathsf{pos}}{\to} t\_{j}^{\prime} \land \\ \qquad \qquad \qquad \qquad \qquad \langle \forall e \in Ts\_{i}^{\prime} \colon w \stackrel{\mathsf{pos}}{\to} t\_{j}^{\prime} \Rightarrow (\mathsf{1oc}(e) \neq \mathsf{x} \lor e \notin \mathcal{W}) \rangle) \\ \land (w \notin Ts\_{i}^{\prime} \Rightarrow \langle \forall e \in Ts\_{i}^{\prime} \colon e \stackrel{\mathsf{pos}}{\to} t\_{j}^{\prime} \Rightarrow (\mathsf{1oc}(e) \neq \mathsf{x} \lor e \notin \mathcal{W}) \rangle) \\ \qquad \qquad \qquad \land \exists r' \in S\_{i} \colon \mathsf{Loc}(r') = \mathsf{x} \land (w, r') \in G. \mathrm{rf} \end{array} \right\} $$

That is, we construct our graph such that each read event t <sup>j</sup> from location <sup>x</sup> in *Ts* <sup>i</sup> either (i) is preceded by a write event <sup>w</sup> to <sup>x</sup> in *Ts* <sup>i</sup> without an intermediate write in between them and thus 'reads-from' w (lines two and three); or (ii) is not preceded by a write event in *Ts* <sup>i</sup> and thus 'reads-from' the write event <sup>w</sup> from which the initial snapshot read <sup>r</sup> in <sup>S</sup><sup>i</sup> obtained the value of <sup>x</sup> (last two lines).

Given a consistent implementation graph *G* = (*E*, po,rf, mo), we construct a consistent specification graph <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ) such that:


#### **4.4 Implementation Completeness**

The PSI implementation in Fig. 2 is *complete*: for each consistent specification graph Γ a corresponding implementation graph *G* can be constructed such that RA-consistent(*G*) holds. We next state our completeness theorem and describe our construction of consistent implementation graphs. We refer the reader to the technical appendix [4] for the full completeness proof.

**Theorem 2 (Completeness).** *For all PSI-consistent specification graphs* Γ *of a transactional program, there exists an RA-consistent execution graph G of the implementation in Fig. 2 that has the same program outcome.*

**Constructing Consistent Implementation Graphs.** In order to construct an execution graph of the implementation *G* from the specification Γ, we follow similar steps as those in the soundness construction, in reverse order. More concretely, given each trace θ <sup>i</sup> of the specification, we construct an analogous trace of the implementation by inserting the appropriate events for acquiring and inspecting the version locks, as well as obtaining a snapshot. For each transaction class <sup>T</sup><sup>i</sup> ∈ T /st, we must first determine its read and write sets and subsequently decide the order in which the version locks are acquired (for locations in the write set) and inspected (for locations in the read set). This then enables us to construct the 'reads-from' and 'modification-order' relations for the events associated with version locks.

Given a consistent execution graph of the specification <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ), and a transaction class <sup>T</sup><sup>i</sup> <sup>∈</sup> Γ.<sup>T</sup> /st, we write WS<sup>T</sup>*<sup>i</sup>* for the set of locations written to by <sup>T</sup>i. That is, WS<sup>T</sup>*<sup>i</sup>* - <sup>e</sup>∈T*i*∩W loc(e). Similarly, we write RS<sup>T</sup>*<sup>i</sup>* for the set of locations read from by <sup>T</sup>i, *prior to* being written to by <sup>T</sup>i. For each location <sup>x</sup> read from by Ti, we additionally record the first read event in T<sup>i</sup> that retrieved the value of x. That is,

$$\mathsf{RS}\_{\mathcal{T}\_i} \stackrel{\Delta}{=} \left\{ (\mathbf{x}, r) \, \Big|\, r \in \mathcal{T}\_i \cap \mathcal{R}\_{\mathbf{x}} \land \neg \exists e \in \mathcal{T}\_i \cap E\_{\mathbf{x}} \colon e \stackrel{\rm po}{\to} r \right\}$$

Note that transaction <sup>T</sup><sup>i</sup> may contain several read events reading from <sup>x</sup>, prior to subsequently updating it. However, the internal-read-consistency property ensures that all such read events read from the same write event. As such, as part of the read set of T<sup>i</sup> we record the first such read event (in program-order).

Determining the ordering of lock events hinges on the following observation. Given a consistent execution graph of the specification <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ), let for each location <sup>x</sup> the total order mo be given as: <sup>w</sup><sup>1</sup> mo|imm → ··· mo|imm <sup>→</sup> <sup>w</sup><sup>n</sup>x . Observe that this order can be broken into adjacent segments where the events of each segment belong to the *same* transaction. That is, given the transaction classes Γ.<sup>T</sup> /st, the order above is of the following form where <sup>T</sup>1, ··· , <sup>T</sup><sup>m</sup> <sup>∈</sup> Γ.<sup>T</sup> /st and for each such <sup>T</sup><sup>i</sup> we have <sup>x</sup> <sup>∈</sup> WS<sup>T</sup>*<sup>i</sup>* and <sup>w</sup>(i,1) ··· <sup>w</sup>(i,n*i*) ∈ Ti:

$$\underbrace{w\_{\{1,1\}} \stackrel{m\diamondsuit\_{\text{lim}}}{\to} \cdots \stackrel{m\diamondsuit\_{\text{lim}}}{\to} w\_{\{1,n\_1\}}}\_{T\_1} \stackrel{m\diamondsuit\_{\text{lim}}}{\to} \cdots \stackrel{m\diamondsuit\_{\text{lim}}}{\to} \underbrace{w\_{\{m,1\}} \stackrel{m\diamondsuit\_{\text{lim}}}{\to} \cdots \stackrel{m\diamondsuit\_{\text{lim}}}{\to} w\_{\{m,n\_m\}}}\_{T\_m}$$

Were this not the case and we had w<sup>1</sup> mo <sup>→</sup> <sup>w</sup> mo <sup>→</sup> <sup>w</sup><sup>2</sup> such that <sup>w</sup>1, w<sup>2</sup> ∈ T<sup>i</sup> and <sup>w</sup> ∈ T<sup>j</sup> <sup>=</sup> <sup>T</sup>i, we would consequently have <sup>w</sup><sup>1</sup> mo <sup>→</sup><sup>T</sup> <sup>w</sup> mo <sup>→</sup><sup>T</sup> <sup>w</sup>1, contradicting the assumption that Γ is consistent. Given the above order, let us then define Γ.MOx = [T<sup>1</sup> ···Tm]. We write Γ.MOx<sup>|</sup> <sup>i</sup> for the ith item of Γ.MOx. As we describe shortly, we use Γ.MOx to determine the order of lock events.

Note that the execution trace for each transaction <sup>T</sup><sup>i</sup> <sup>∈</sup> Γ.<sup>T</sup> /st is of the form θ <sup>i</sup> = *B*<sup>i</sup> po → *Ts* i po <sup>→</sup> *<sup>E</sup>*i, where *<sup>B</sup>*<sup>i</sup> is a transaction-begin (B) event, *<sup>E</sup>*<sup>i</sup> is a transaction-end (E) event, and *Ts* <sup>i</sup> = *t* 1 po →··· po → *t* <sup>n</sup> for some <sup>n</sup>, where each *<sup>t</sup>* <sup>j</sup> is either a read or a write event. As such, we have Γ.*<sup>E</sup>* <sup>=</sup> Γ.<sup>T</sup> <sup>=</sup> <sup>T</sup>*i*∈Γ.<sup>T</sup> /st <sup>T</sup><sup>i</sup> <sup>=</sup> θ <sup>i</sup>.*E*.

For each trace θ <sup>i</sup> of the specification, we construct a corresponding trace of our implementation <sup>θ</sup><sup>i</sup> as follows. Let RS<sup>T</sup>*<sup>i</sup>* <sup>=</sup> {(x1, r1)···(x<sup>p</sup>, rp)} and WS<sup>T</sup>*<sup>i</sup>* <sup>=</sup> {y<sup>1</sup> ··· <sup>y</sup><sup>q</sup>}. We then construct <sup>θ</sup><sup>i</sup> <sup>=</sup> *Ls*<sup>i</sup> po → *S*<sup>i</sup> po → *Ts*<sup>i</sup> po → *Us*i, where

• *Ls*<sup>i</sup> = *L*y<sup>1</sup> i po →··· po → *L* y*q* <sup>i</sup> and *Us*<sup>i</sup> = *U* <sup>y</sup><sup>1</sup> i po →··· po → *U* <sup>y</sup>*<sup>q</sup>* <sup>i</sup> denote the sequence of events acquiring and releasing the version locks, respectively. Each *L* y*j* i and *U* <sup>y</sup>*<sup>j</sup>* <sup>i</sup> are defined as follows, the first event <sup>L</sup>y<sup>1</sup> <sup>i</sup> has the same identifier as that of Bi, the last event Uy*<sup>q</sup>* <sup>i</sup> has the same identifier as that of <sup>E</sup>i, and the identifiers of the remaining events are picked fresh:

$$L\_i^{\mathbf{y}\_j} = \mathbb{U}(\mathbf{v}\mathbf{y}\_j, 2a, 2a+1) \quad U\_i^{\mathbf{y}\_j} = \mathbb{W}(\mathbf{v}\mathbf{y}\_j, 2a+2) \quad \text{where} \quad \mathsf{MO}\_{\mathbf{y}\_j}\Big|\_a = \mathcal{T}\_i$$

We then define the mo relation for version locks such that if transaction <sup>T</sup><sup>i</sup> writes to <sup>y</sup> immediately after <sup>T</sup><sup>j</sup> (i.e. <sup>T</sup><sup>i</sup> is MOy-ordered immediately after <sup>T</sup><sup>j</sup> ), then <sup>T</sup><sup>i</sup> acquires the vy version lock immediately after <sup>T</sup><sup>j</sup> has released it. On the other hand, if T<sup>i</sup> is the first transaction to write to y, then it acquires vy immediately after the event initialising the value of vy, written *init* vy. Moreover, each vy release event of <sup>T</sup><sup>i</sup> is mo-ordered immediately after the corresponding vy acquisition event in <sup>T</sup>i:

$$\mathsf{IMO}\_{i} \triangleq \bigcup\_{\mathbf{y} \in \mathsf{W}\mathbf{S}\_{T\_{i}}} \left\{ \begin{matrix} (L\_{i}^{\mathsf{T}}, U\_{i}^{\mathsf{T}}), \\ (w, L\_{i}^{\mathsf{T}}) \end{matrix} \middle| \begin{matrix} (\varGamma.\mathsf{MO}\_{\mathbf{x}})\_{0} = T\_{i} \Rightarrow w = \mathit{init}\_{\mathsf{vy}}) \wedge \\ (\exists T\_{j}, a > 0. \ \Gamma.\mathsf{MO}\_{\mathbf{y}}\big|\_{a} = T\_{i} \land \Gamma.\mathsf{MO}\_{\mathbf{y}}\big|\_{a-1} = T\_{j} \end{matrix} \right\}$$

This partial mo order on lock events of T<sup>i</sup> also determines the rf relation for its lock acquisition events: IRF<sup>1</sup> <sup>i</sup> - <sup>y</sup>∈WS*T<sup>i</sup>* (w, Ly <sup>i</sup> ) (w, L<sup>y</sup> <sup>i</sup> ) ∈ IMO<sup>i</sup> .

• *S*<sup>i</sup> = *tr* <sup>x</sup><sup>1</sup> i po → ··· po → *tr* <sup>x</sup>*<sup>p</sup>* i po → *vr* <sup>x</sup><sup>1</sup> i po → ··· po → *vr* <sup>x</sup>*<sup>p</sup>* <sup>i</sup> denotes the sequence of events obtaining a tentative snapshot (*tr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> ) and subsequently validating it (*vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> ). Each *tr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> sequence is defined as *ir* <sup>x</sup>*<sup>j</sup>* i po <sup>→</sup> <sup>r</sup> x*j* i po <sup>→</sup> <sup>s</sup> x*j* <sup>i</sup> (reading the version lock vx<sup>j</sup> , reading <sup>x</sup><sup>j</sup> and recoding it in <sup>s</sup>), with *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> , <sup>r</sup> x*j* <sup>i</sup> , <sup>s</sup> x*j* <sup>i</sup> and *vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> events defined as follows (with fresh identifiers). We then define the rf relation for each of these read events in <sup>S</sup>i. For each (x, r) <sup>∈</sup> RS<sup>T</sup>*<sup>i</sup>* , when <sup>r</sup> (i.e. the read event in the specification class <sup>T</sup><sup>i</sup> that reads the value of x) reads from an event <sup>w</sup> in the specification graph ((w, r) <sup>∈</sup> Γ.rf), we add (w, rx <sup>i</sup> ) to the rf relation of *G* (the first line of IRF<sup>2</sup> <sup>i</sup> below). For version locks, if transaction <sup>T</sup><sup>i</sup> also writes to <sup>x</sup><sup>j</sup> , then *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> events (reading and validating the value of version lock vx<sup>j</sup> ), read from the lock event in <sup>T</sup><sup>i</sup> that acquired vx<sup>j</sup> , namely <sup>L</sup>x*<sup>j</sup>* <sup>i</sup> . On the other hand, if transaction T<sup>i</sup> does not write to <sup>x</sup><sup>j</sup> and it reads the value of <sup>x</sup><sup>j</sup> written by <sup>T</sup><sup>j</sup> , then *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> read the value written to vx<sup>j</sup> by <sup>T</sup><sup>j</sup> when releasing it (*<sup>U</sup>* <sup>x</sup> <sup>j</sup> ). Lastly, if <sup>T</sup><sup>i</sup> does not write to <sup>x</sup><sup>j</sup> and it reads the value of <sup>x</sup><sup>j</sup> written by the initial write, *init* x, then *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> read the value written to vx<sup>j</sup> by the initial write to vx, *init* vx.

IRF<sup>2</sup> <sup>i</sup> - (x,r)∈RS*T<sup>i</sup>* ⎧ ⎪⎪⎨ ⎪⎪⎩ (w, rx i ), (w , *ir* <sup>x</sup> i ), (w , *vr* <sup>x</sup> i ) (w, r) <sup>∈</sup> Γ.rf <sup>∧</sup>(<sup>x</sup> <sup>∈</sup> WST*<sup>i</sup>* <sup>⇒</sup> <sup>w</sup> =Lx i ) <sup>∧</sup>(<sup>x</sup> <sup>∈</sup> WS<sup>T</sup>*<sup>i</sup>* ∧ ∃T<sup>j</sup> . w ∈ T<sup>j</sup> <sup>⇒</sup> <sup>w</sup> =Ux j ) <sup>∧</sup>(<sup>x</sup> <sup>∈</sup> WS<sup>T</sup>*<sup>i</sup>* <sup>∧</sup> <sup>w</sup>=*init* <sup>x</sup> <sup>⇒</sup> <sup>w</sup> <sup>=</sup>*init* vx) ⎫ ⎪⎪⎬ ⎪⎪⎭ r x*j* <sup>i</sup> <sup>=</sup>R(x<sup>j</sup> , v) <sup>s</sup> x*j* <sup>i</sup> <sup>=</sup>W(s[x<sup>j</sup>], v) s.t. <sup>∃</sup>w. (w, rx*<sup>j</sup>* <sup>i</sup> ) <sup>∈</sup> IRF<sup>2</sup> <sup>i</sup> <sup>∧</sup> valw(w)=<sup>v</sup> *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> =*vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> <sup>=</sup>R(vx<sup>j</sup> , v) s.t. <sup>∃</sup>w. (w, *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> ) <sup>∈</sup> IRF<sup>2</sup> <sup>i</sup> <sup>∧</sup> valw(w)=<sup>v</sup>

• *Ts*<sup>i</sup> = *t*<sup>1</sup> po → ··· po → *t*<sup>n</sup> (when *Ts* <sup>i</sup> = *t* 1 po → ··· po → *t* <sup>n</sup>), with *t*<sup>j</sup> defined as follows:

$$\begin{aligned} t\_j &= \mathsf{R}(\mathbf{s}\u{\mathsf{L}x}), v) \text{ when } t'\_j = \mathsf{R}(\mathbf{x}, v) \\ t\_j &= \mathsf{W}(\mathbf{x}, v) \stackrel{\mathsf{po}|\_{\text{lim}}}{\to} \mathsf{W}(\mathbf{s}\u{\mathsf{L}x}), v) \text{ when } t'\_j = \mathsf{W}(\mathbf{x}, v) \end{aligned}$$

When *t* <sup>j</sup> is a read event, the *t*<sup>j</sup> has the same identifier as that of *t* <sup>j</sup> . When *t* <sup>j</sup> is a write event, the first event in *t*<sup>j</sup> has the same identifier as that of *t*<sup>j</sup> and the identifier of the second event is picked fresh.

We are now in a position to construct our implementation graph. Given a consistent execution graph Γ of the specification, we construct an execution graph *G* = (*E*, po,rf, mo) of the implementation as follows.


### **5 Robust Parallel Snapshot Isolation (RPSI)**

In the previous section we adapted the PSI semantics in [14] to STM settings, in the *absence* of non-transactional code. However, a reasonable STM should account for mixed-mode code where shared data is accessed by both transactional and non-transactional code. To remedy this, we explore the semantics of PSI STMs in the presence of non-transactional code with *weak isolation* guarantees (see Sect. 2.1). We refer to the weakly isolated behaviour of such PSI STMs as *robust parallel snapshot isolation* (RPSI), due to its ability to provide PSI guarantees between transactions even in the presence of non-transactional code.

**Fig. 3.** RPSI-inconsistent executions due to nt-rf (a); and t-rf (b)

In Sect. 5.1 we propose the first declarative specification of RPSI STM programs. Later in Sect. 5.2 we develop a lock-based reference implementation of our RPSI specification in the RA fragment. We then demonstrate that our implementation is both sound (Sect. 5.3) and complete (Sect. 5.4) with respect to our proposed specification.

### **5.1 A Declarative Specification of RPSI STMs in RA**

We formulate a declarative specification of RPSI semantics by adapting the PSI semantics presented in Sect. 4.1 to account for non-transactional accesses. As with the PSI specification in Sect. 4.1, throughout this section, we take RPSI execution graphs to be those in which T ⊆ (R∪W) \ U. That is, RPSI transactions consist solely of read and write events (excluding updates). As before, we characterise the set of executions admitted by RPSI as graphs that lack cycles of certain shapes. More concretely, as with the PSI specification, we consider an RPSI execution graph to be *consistent* if acyclic(rpsi-hb*loc* ∪ mo ∪ rb) holds, where rpsi-hb denotes the *'RPSI-happens-before'* relation, extended from that of PSI psi-hb.

**Definition 4 (RPSI consistency).** An RPSI execution graph Γ = (*E*, po, rf, , mo, <sup>T</sup> ) is consistent, written rpsi-consistent(Γ), if acyclic(rpsi-hb*loc* <sup>∪</sup> mo∪rb) holds, where rpsi-hb denotes the *'RPSI-happens-before'* relation, defined as the smallest relation that satisfies the following conditions:


The trans and psi-hb ensure that rpsi-hb is transitive and that it includes po, rf and mo<sup>T</sup> as with its PSI counterpart. The nt-rf ensures that if a value written by a non-transactional write w is observed (read from) by a read event <sup>r</sup> in a transaction T, then its effect is observed by *all* events in T. That is, the <sup>w</sup> *happens-before* all events in T and not just <sup>r</sup>. This allows us to rule out executions such as the one depicted in Fig. 3a, which we argue must be disallowed by RPSI.

Consider the execution graph of Fig. 3a, where transaction <sup>T</sup><sup>1</sup> is denoted by the dashed box labelled <sup>T</sup><sup>1</sup>, comprising the read events <sup>r</sup><sup>1</sup> and <sup>r</sup>2. Note that as r<sup>1</sup> and r<sup>2</sup> are transactional reads without prior writes by the transaction, they constitute a *snapshot* of the memory at the time <sup>T</sup><sup>1</sup> started. That is, the values read by r<sup>1</sup> and r<sup>2</sup> must reflect a valid snapshot of the memory at the time it was taken. As such, since we have (w2, r2) <sup>∈</sup> rf, any event preceding <sup>w</sup><sup>2</sup> by the 'happens-before' relation must also be observed by (synchronise with) <sup>T</sup><sup>1</sup>. In particular, as <sup>w</sup><sup>1</sup> happens-before <sup>w</sup><sup>2</sup> ((w1, w2) <sup>∈</sup> po), the <sup>w</sup><sup>1</sup> write must also be observed by T1. The nt-rf thus ensures that a non-transactional write read from by a transaction (i.e. a snapshot read) synchronises with the entire transaction.

Recall from Sect. 4.1 that the PSI psi-hb relation includes rf<sup>T</sup> which has not yet been included in rpsi-hb through the first three conditions described. As we describe shortly, the t-rf is indeed a strengthening of rf<sup>T</sup> to account for the presence of non-transactional events. In particular, note that rf<sup>T</sup> is included in the left-hand side of t-rf: when rpsi-hb in ([W];st; (rpsi-hb \ st);st; [R]) is replaced with rf ⊆ rpsi-hb, the left-hand side yields rfT. As such, in the absence of non-transactional events, the definitions of psi-hb and rpsi-hb coincide.

Recall that inclusion of rf<sup>T</sup> in psi-hb ensured transactional synchronisation due to causal ordering: if <sup>T</sup><sup>1</sup> writes to <sup>x</sup> and <sup>T</sup><sup>2</sup> later (in psi-hb order) reads <sup>x</sup>, then <sup>T</sup><sup>1</sup> must synchronise with <sup>T</sup><sup>2</sup>. This was achieved in PSI because either (i) <sup>T</sup><sup>2</sup> reads <sup>x</sup> directly from <sup>T</sup><sup>1</sup> in which case <sup>T</sup><sup>1</sup> synchronises with <sup>T</sup><sup>2</sup> via rfT; or (ii) <sup>T</sup><sup>2</sup> reads <sup>x</sup> from another later (mo-ordered) transactional write in <sup>T</sup><sup>3</sup>, in which case <sup>T</sup><sup>1</sup> synchronises with <sup>T</sup><sup>3</sup> via moT, <sup>T</sup><sup>3</sup> synchronises with <sup>T</sup><sup>2</sup> via rfT, and thus <sup>T</sup><sup>1</sup> synchronises with <sup>T</sup><sup>2</sup> via moT;rfT. How are we then to extend rpsi-hb to guarantee transactional synchronisation due to causal ordering in the presence of non-transactional events?

To justify t-rf, we present an execution graph that does not guarantee synchronisation between causally ordered transactions and is nonetheless deemed RPSI-consistent *without* the t-rf condition on rpsi-hb. We thus argue that this execution must be precluded by RPSI, justifying the need for t-rf. Consider the execution in Fig. 3b. Observe that as transaction <sup>T</sup><sup>1</sup> writes to <sup>x</sup> via <sup>w</sup>1, transaction <sup>T</sup><sup>2</sup> reads <sup>x</sup> via <sup>r</sup>2, and (w1, r2) <sup>∈</sup> rpsi-hb (w<sup>1</sup> rf <sup>→</sup> <sup>r</sup><sup>1</sup> po <sup>→</sup> <sup>w</sup><sup>3</sup> rf <sup>→</sup> <sup>r</sup>2), <sup>T</sup><sup>1</sup> is causally ordered before <sup>T</sup><sup>2</sup> and hence <sup>T</sup><sup>1</sup> must synchronise with <sup>T</sup><sup>2</sup>. As such, the <sup>r</sup><sup>3</sup> in <sup>T</sup><sup>2</sup> must observe <sup>w</sup><sup>2</sup> in <sup>T</sup><sup>1</sup>: we must have (w2, r3) <sup>∈</sup> rpsi-hb, rendering the above execution RPSI-inconsistent. To enforce the rpsi-hb relation between such causally ordered transactions with intermediate non-transactional events, t-rf stipulates that if a transaction <sup>T</sup><sup>1</sup> writes to a location (e.g. <sup>x</sup> via <sup>w</sup><sup>1</sup> above), another transaction <sup>T</sup><sup>2</sup> reads from the same location (r2), and the two events are related by 'RPSI-happens-before' ((w1, r2) <sup>∈</sup> rpsi-hb), then <sup>T</sup><sup>1</sup> must synchronise with <sup>T</sup><sup>2</sup>. That is, all events in <sup>T</sup><sup>1</sup> must 'RPSI-happen-before' those in T<sup>2</sup>. Effectively, this allows us to transitively close the causal ordering between transactions, spanning transactional and non-transactional events in between.

**Fig. 4.** A mixed-mode program with its annotated behaviour disallowed by RPSI (left); an RA-consistent execution graph of its RPSI implementation (right)

#### **5.2 A Lock-Based RPSI Implementation in RA**

We present a lock-based reference implementation of RPSI in the RA fragment (Fig. 2) by using sequence locks [13,18,23,32]. Our implementation is both sound and complete with respect to our declarative RPSI specification in Sect. 5.1.

The RPSI implementation in Fig. 2 is rather similar to its PSI counterpart. The main difference between the two is in how they *validate* the tentative snapshot recorded in s. As before, in order to ensure that no intermediate *transactional* writes have intervened since s was recorded, for each location x in RS, the validation phase revisits vx, inspecting whether its value has changed from that recorded in v[x]. If this is the case, the snapshot is deemed invalid and the process is restarted. However, checking against intermediate transactional writes alone is not sufficient as it does not preclude the intervention of *non-transactional* writes. This is because unlike transactional writes, non-transactional writes do not update the version locks and as such their updates may go unnoticed. In order to rule out the possibility of intermediate non-transactional writes, for each location x the implementation checks the value of x against that recorded in s[x]. If the values do not agree, an intermediate non-transactional write has been detected: the snapshot fails validation and the process is restarted. Otherwise, the snapshot is successfully validated and returned in s. Observe that checking the value of x against s[x] does not entirely preclude the presence of non-transactional writes, in cases where the same value is written (nontransactionally) to x twice.

To understand this, consider the mixed-mode program on the left of Fig. 4 comprising a transaction in the left-hand thread and a non-transactional program in the right-hand thread writing the same value (1) to z twice. Note that the annotated behaviour is disallowed under RPSI: all execution graphs of the program with the annotated behaviour yield RPSI-inconsistent execution graphs. Intuitively, this is because the values read by the transaction (x : 0, y : 0, z : 1) do not constitute a valid *snapshot*: at *no* point during the execution of this program, are the values of x, y and z as annotated.

Nevertheless, it is possible to find an RA-consistent execution of the RPSI implementation in Fig. 2 that reads the annotated values as its snapshot. Consider the execution graph on the right-hand side of Fig. 4, depicting a particular execution of the RPSI implementation (Fig. 2) of the program on the left. The rx, ry and rz denote the events reading the initial snapshot of x, y and z and recording them in s (line 5), respectively. Similarly, the rx , ry and rz denote the events validating the snapshots recorded in s (line 7). As T is the only transaction in the program, the version numbers vx, vy and vz remain unchanged throughout the execution and we have thus omitted the events reading (line 2) and validating (line 7) their values from the execution graph. Note that this execution graph is RA-consistent even though we cannot find a corresponding RPSI-consistent execution with the same outcome. To ensure the soundness of our implementation, we must thus rule out such scenarios.

To do this, we assume that if multiple non-transactional writes write the same value to the same location, they cannot race with the same transaction. More concretely, we assume that *every* RPSI-consistent execution graph of a given program satisfies the following condition:

$$\begin{array}{l} \mathsf{Vx. } \forall r \in \mathcal{T} \cap \mathcal{R}\_{\texttt{x.}} \,\forall w, w' \in \mathcal{N}\mathcal{T} \cap \mathcal{W}\_{\texttt{x.}}\\\ w \neq w' \land \texttt{val}\_{\texttt{v}}(w) = \texttt{val}\_{\texttt{v}}(w') \land (r, w) \notin \texttt{rpsi-hb} \land (r, w') \notin \texttt{rpsi-hb} \\\ \Rightarrow (w, r) \in \texttt{rpsi-hb} \land (w', r) \in \texttt{rpsi-hb} \end{array} \qquad (\*)$$

That is, given a transactional read <sup>r</sup> from location x, and any two distinct non-transactional writes <sup>w</sup>, <sup>w</sup> of the same value to <sup>x</sup>, either (i) at least one of the writes RPSI-happen-after r; or (ii) they both RPSI-happen-before r.

Observe that this does not hold of the program in Fig. 2. Note that this stipulation does not prevent two *transactions* to write the same value to a location x. As such, in the absence of non-transactional writes, our RPSI implementation is equivalent to that of PSI in Sect. 4.2.

#### **5.3 Implementation Soundness**

The RPSI implementation in Fig. 2 is *sound*: for each consistent implementation graph *G*, a corresponding specification graph Γ can be constructed such that rpsi-consistent(Γ) holds. In what follows we state our soundness theorem and briefly describe our construction of consistent specification graphs. We refer the reader to the technical appendix [4] for the full soundness proof.

**Theorem 3 (Soundness).** *Let* P *be a program that possibly mixes transactional and non-transactional code. If every RPSI-consistent execution graph of* P *satisfies the condition in* (∗)*, then for all RA-consistent implementation graphs G of the implementation in Fig. 2, there exists an RPSI-consistent specification graph* Γ *of the corresponding transactional program with the same program outcome.*

**Constructing Consistent Specification Graphs.** Constructing an RPSIconsistent specification graph from the implementation graph is similar to the corresponding PSI construction described in Sect. 4.3. More concretely, the events associated with non-transactional events remain unchanged and are simply added to the specification graph. On the other hand, the events associated with transactional events are adapted in a similar way to those of PSI in Sect. 4.3. In particular, observe that given an execution of the RPSI implementation with t transactions, as with the PSI implementation, the trace of each transaction <sup>i</sup> ∈ {<sup>1</sup> ···t} is of the form <sup>θ</sup><sup>i</sup> <sup>=</sup> *Ls*<sup>i</sup> po → *FS*<sup>i</sup> po → *S*<sup>i</sup> po → *Ts*<sup>i</sup> po → *Us*i, with *Ls*i, *FS*i, *S*i, *Ts*<sup>i</sup> and *Us*<sup>i</sup> denoting analogous sequences of events to those of PSI. The difference between an RPSI trace θ<sup>i</sup> and a PSI one is in the *FS*<sup>i</sup> and *S*<sup>i</sup> sequences, obtaining the snapshot. In particular, the validation phases of *FS*<sup>i</sup> and *S*<sup>i</sup> in RPSI include an additional read for each location to rule out intermediate nontransactional writes. As in the PSI construction, for each transactional trace θ<sup>i</sup> of our implementation, we construct a corresponding trace of the specification as θ <sup>i</sup> <sup>=</sup> <sup>B</sup><sup>i</sup> po → *Ts* i po <sup>→</sup> <sup>E</sup>i, with <sup>B</sup>i, <sup>E</sup><sup>i</sup> and *Ts* <sup>i</sup> as defined in Sect. 4.3.

Given a consistent RPSI implementation graph *G* = (*E*, po,rf, mo), let *<sup>G</sup>*.N T - *<sup>G</sup>*.*<sup>E</sup>* \ <sup>i</sup>∈{1···t} θ.*<sup>E</sup>* denote the non-transactional events of *<sup>G</sup>*. We construct a consistent RPSI specification graph <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ) such that:


We refer the reader to the technical appendix [4] for the full proof demonstrating that the above construction of Γ yields a consistent specification graph.

### **5.4 Implementation Completeness**

The RPSI implementation in Fig. 2 is *complete*: for each consistent specification graph Γ a corresponding implementation graph *G* can be constructed such that RA-consistent(*G*) holds. We next state our completeness theorem and describe our construction of consistent implementation graphs. We refer the reader to the technical appendix [4] for the full completeness proof.

**Theorem 4 (Completeness).** *For all RPSI-consistent specification graphs* Γ *of a program, there exists an RA-consistent execution graph G of the implementation in Fig. 2 that has the same program outcome.*

**Constructing Consistent Implementation Graphs.** In order to construct an execution graph of the implementation *G* from the specification Γ, we follow similar steps as those in the corresponding PSI construction in Sect. 4.4. More concretely, the events associated with non-transactional events are unchanged and simply added to the implementation graph. For transactional events, given each trace θ <sup>i</sup> of a transaction in the specification, as before we construct an analogous trace of the implementation by inserting the appropriate events for acquiring and inspecting the version locks, as well as obtaining a snapshot. For each transaction class <sup>T</sup><sup>i</sup> ∈ T /st, we first determine its read and write sets as before and subsequently decide the order in which the version locks are acquired and inspected. This then enables us to construct the 'reads-from' and 'modificationorder' relations for the events associated with version locks.

Given a consistent execution graph of the specification <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ), and a transaction class <sup>T</sup><sup>i</sup> <sup>∈</sup> Γ.<sup>T</sup> /st, we define WS<sup>T</sup>*<sup>i</sup>* and RS<sup>T</sup>*<sup>i</sup>* as described in Sect. 4.4. Determining the ordering of lock events hinges on a similar observation as that in the PSI construction. Given a consistent execution graph of the specification <sup>Γ</sup> = (*E*, po,rf, mo, <sup>T</sup> ), let for each location x the total order mo be given as: w<sup>1</sup> mo|imm → ··· mo|imm <sup>→</sup> <sup>w</sup><sup>n</sup>x . This order can be broken into adjacent segments where the events of each segment are *either* non-transactional writes *or* belong to the *same* transaction. That is, given the transaction classes Γ.<sup>T</sup> /st, the order above is of the following form where <sup>T</sup>1, ··· , <sup>T</sup><sup>m</sup> <sup>∈</sup> Γ.<sup>T</sup> /st and for each such <sup>T</sup><sup>i</sup> we have <sup>x</sup> <sup>∈</sup> WS<sup>T</sup>*<sup>i</sup>* and <sup>w</sup>(i,1) ··· <sup>w</sup>(i,n*i*) ∈ Ti:

$$\underbrace{w\_{\{1,1\}} \stackrel{mo|\_{\text{imm}}}{\stackrel{mo|\_{\text{imm}}}} \cdots \stackrel{mo|\_{\text{imm}}}{\stackrel{mo|\_{\text{imm}}}} w\_{\{1,n\_1\}} \stackrel{mo|\_{\text{imm}}}{\stackrel{mo|\_{\text{imm}}}} \cdots \stackrel{mo|\_{\text{imm}}}{\stackrel{mo|\_{\text{imm}}}} \underbrace{w\_{\{m,1\}} \stackrel{mo|\_{\text{imm}}}{\stackrel{mo|\_{\text{imm}}}} \cdots \stackrel{mo|\_{\text{imm}}}{\stackrel{mo|\_{\text{imm}}}} w\_{\{m,n\_m\}}$$

Were this not the case and we had w<sup>1</sup> mo <sup>→</sup> <sup>w</sup> mo <sup>→</sup> <sup>w</sup><sup>2</sup> such that <sup>w</sup>1, w<sup>2</sup> ∈ T<sup>i</sup> and <sup>w</sup> ∈ T<sup>j</sup> <sup>=</sup> <sup>T</sup>i, we would consequently have <sup>w</sup><sup>1</sup> mo <sup>→</sup><sup>T</sup> <sup>w</sup> mo <sup>→</sup><sup>T</sup> <sup>w</sup>1, contradicting the assumption that <sup>Γ</sup> is consistent. We thus define Γ.MOx = [T<sup>1</sup> ···Tm].

Note that each transactional execution trace of the specification is of the form θ <sup>i</sup> = *B*<sup>i</sup> po → *Ts* i po → *E*i, with *B*i, *E*<sup>i</sup> and *Ts* <sup>i</sup> as described in Sect. 4.4. For each such θ <sup>i</sup>, we construct a corresponding trace of our implementation as θ<sup>i</sup> = *Ls*<sup>i</sup> po → *S*<sup>i</sup> po → *Ts*<sup>i</sup> po → *Us*i, where *Ls*i, *Ts*<sup>i</sup> and *Us*<sup>i</sup> are as defined in Sect. 4.4, and *S*<sup>i</sup> = *tr* <sup>x</sup><sup>1</sup> i po →··· po → *tr* <sup>x</sup>*<sup>p</sup>* i po → *vr* <sup>x</sup><sup>1</sup> i po →··· po → *vr* <sup>x</sup>*<sup>p</sup>* <sup>i</sup> denotes the sequence of events obtaining a tentative snapshot (*tr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> ) and subsequently validating it (*vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> ). Each *tr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> sequence is of the form *ivr* <sup>x</sup>*<sup>j</sup>* i po → *ir* <sup>x</sup>*<sup>j</sup>* i po <sup>→</sup> <sup>s</sup> x*j* <sup>i</sup> , with *ivr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> , *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and s x*j* <sup>i</sup> defined below (with fresh identifiers). Similarly, each *vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> sequence is of the form *fr* <sup>x</sup>*<sup>j</sup>* i po → *fvr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> , with *fr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *fvr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> defined as follows (with fresh identifiers). We then define the rf relation for each of these read events in S<sup>i</sup> in a similar way.

For each (x, r) <sup>∈</sup> RS<sup>T</sup>*<sup>i</sup>* , when <sup>r</sup> (the event in the specification class <sup>T</sup><sup>i</sup> that reads the value of x) reads from <sup>w</sup> in the specification graph ((w, r) <sup>∈</sup> Γ.rf), we add (w, *ir* <sup>x</sup> <sup>i</sup> ) and (w, *fr* <sup>x</sup> <sup>i</sup> ) to the rf of *G* (the first line of IRF<sup>2</sup> <sup>i</sup> below). For version locks, as before if transaction <sup>T</sup><sup>i</sup> also writes to <sup>x</sup><sup>j</sup> , then *ivr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *fvr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> events (reading and validating vx<sup>j</sup> ), read from the lock event in <sup>T</sup><sup>i</sup> that acquired vx<sup>j</sup> , namely Lx*<sup>j</sup>* <sup>i</sup> . Similarly, if <sup>T</sup><sup>i</sup> does not write to <sup>x</sup><sup>j</sup> and it reads the value of <sup>x</sup><sup>j</sup>

written by the initial write, *init* x, then *ivr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *fvr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> read the value written to vx<sup>j</sup> by the initial write to vx, *init* vx. Lastly, if transaction <sup>T</sup><sup>i</sup> does not write to <sup>x</sup><sup>j</sup> and it reads <sup>x</sup><sup>j</sup> from a write other than *init* x, then *ir* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> and *vr* <sup>x</sup>*<sup>j</sup>* <sup>i</sup> read from the unlock event of a transaction <sup>T</sup><sup>j</sup> (i.e. <sup>U</sup><sup>x</sup> <sup>j</sup> ), who has <sup>x</sup> in its write set and whose write to <sup>x</sup>, <sup>w</sup>x, maximally 'RPSI-happens-before' <sup>r</sup>. That is, for all other such writes that 'RPSI-happen-before' <sup>r</sup>, then <sup>w</sup>x 'RPSI-happens-after' them.

IRF<sup>2</sup> *<sup>i</sup>* - - (x*,r*)∈RS*T<sup>i</sup>* ⎧ ⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎩ (w, *ir* <sup>x</sup> *i* ), (w, *fr* <sup>x</sup> *i* ), (w , *ivr* <sup>x</sup> *i* ), (w , *fvr* <sup>x</sup> *i* ) (w, r) <sup>∈</sup> Γ.rf <sup>∧</sup> (<sup>x</sup> <sup>∈</sup> WS<sup>T</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>w</sup> =Lx *i* ) <sup>∧</sup>(<sup>x</sup> ∈ WS<sup>T</sup>*<sup>i</sup>* <sup>∧</sup> <sup>w</sup>=*init* <sup>x</sup> <sup>⇒</sup> <sup>w</sup> <sup>=</sup>*init* vx) <sup>∧</sup>(<sup>x</sup> ∈ WS<sup>T</sup>*<sup>i</sup>* <sup>∧</sup> <sup>w</sup>=*init* <sup>x</sup> <sup>⇒</sup> <sup>∃</sup>wx, <sup>T</sup>*<sup>j</sup>* . wx ∈ T*<sup>j</sup>* ∩ Wx <sup>∧</sup> <sup>w</sup>x rpsi-hb <sup>→</sup> <sup>r</sup> <sup>∧</sup> <sup>w</sup> =Ux *j* <sup>∧</sup>[∀w x, <sup>T</sup>*k*. w x∈T*<sup>k</sup>* ∩ W<sup>x</sup> <sup>∧</sup> <sup>w</sup> x rpsi-hb <sup>→</sup> <sup>r</sup> <sup>⇒</sup> <sup>w</sup> x rpsi-hb <sup>→</sup> <sup>w</sup>x]) ⎫ ⎪⎪⎪⎪⎪⎬ ⎪⎪⎪⎪⎪⎭ *ir* <sup>x</sup>*<sup>j</sup> <sup>i</sup>* =*fr* <sup>x</sup>*<sup>j</sup> <sup>i</sup>* <sup>=</sup>R(x*<sup>j</sup>* ,v) <sup>s</sup> x*j <sup>i</sup>* <sup>=</sup>W(s[x*j*],v) s.t. <sup>∃</sup>w. (w, *ir* <sup>x</sup>*<sup>j</sup> <sup>i</sup>* ) <sup>∈</sup> IRF<sup>2</sup> *<sup>i</sup>* <sup>∧</sup> valw(w)=<sup>v</sup> *ivr* <sup>x</sup>*<sup>j</sup> <sup>i</sup>* =*fvr* <sup>x</sup>*<sup>j</sup> <sup>i</sup>* <sup>=</sup>R(vx*<sup>j</sup>* , v) s.t. <sup>∃</sup>w. (w, *ivr* <sup>x</sup>*<sup>j</sup> <sup>i</sup>* ) <sup>∈</sup> IRF<sup>2</sup> *<sup>i</sup>* <sup>∧</sup> valw(w)=<sup>v</sup>

We are now in a position to construct our implementation graph. Given a consistent execution graph Γ of the specification, we construct an execution graph of the implementation, *G* = (*E*, po,rf, mo), such that:


$$\bullet \text{ } G.\text{mo} = \Gamma.\text{mo} \cup \left( \bigcup\_{T\_i \in \Gamma.T\_i/\text{st}} \mathsf{l}\mathsf{MO}\_i \right)^+, \text{ with } \mathsf{l}\mathsf{MO}\_i \text{ as defined in Sect. 4.4.}$$

### **6 Conclusions and Future Work**

We studied PSI, for the first time to our knowledge, as a consistency model for STMs as it has several advantages over other consistency models, thanks to its performance and monotonic behaviour. We addressed two significant drawbacks of PSI which prevent its widespread adoption. First, the absence of a simple lock-based reference implementation to allow the programmers to readily understand and reason about PSI programs. To address this, we developed a lock-based reference implementation of PSI in the RA fragment of C11 (using sequence locks), that is both sound and complete with respect to its declarative specification. Second, the absence of a formal PSI model in the presence of mixed-mode accesses. To this end, we formulated a declarative specification of RPSI (robust PSI) accounting for both transactional and non-transactional accesses. Our RPSI specification is an extension of PSI in that in the absence of non-transactional accesses it coincides with PSI. To provide a more intuitive account of RPSI, we developed a simple lock-based RPSI reference implementation by adjusting our PSI implementation. We established the soundness and completeness of our RPSI implementation against its declarative specification.

As directions of future work, we plan to build on top of the work presented here in three ways. First, we plan to explore possible lock-based reference implementations for PSI and RPSI in the context of other weak memory models, such as the full C11 memory models [9]. Second, we plan to study other weak transactional consistency models, such as SI [10], ALA (asymmetric lock atomicity), ELA (encounter-time lock atomicity) [28], and those of ANSI SQL, including RU (read-uncommitted), RC (read-committed) and RR (repeatable reads), in the STM context. We aim to investigate possible lock-based reference implementations for these models that would allow the programmers to understand and reason about STM programs with such weak guarantees. Third, taking advantage of the operational models provided by our simple lock-based reference implementations (those presented in this article as well as those in future work), we plan to develop reasoning techniques that would allow us to verify properties of STM programs. This can be achieved by either extending existing program logics for weak memory, or developing new program logics for currently unsupported models. In particular, we can reason about the PSI models presented here by developing custom proof rules in the existing program logics for RA such as [22,39].

**Acknowledgments.** We thank the ESOP 2018 reviewers for their constructive feedback. This research was supported in part by a European Research Council (ERC) Consolidator Grant for the project "RustBelt", under the European Union's Horizon 2020 Framework Programme (grant agreement no. 683289). The second author was additionally partly supported by Len Blavatnik and the Blavatnik Family foundation.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Eventual Consistency for CRDTs

Radha Jagadeesan and James Riely(B)

DePaul University, Chicago, USA {rjagadeesan,jriely}@cs.depaul.edu

Abstract. We address the problem of *validity* in eventually consistent (EC) systems: In what sense does an EC data structure satisfy the sequential specification of that data structure? Because EC is a very weak criterion, our definition does not describe every EC system; however it is expressive enough to describe any Convergent or Commutative Replicated Data Type (CRDT).

### 1 Introduction

In a replicated implementation of a data structure, there are two impediments to requiring that all replicas achieve consensus on a global total order of the operations performed on the data structure (Lamport 1978): (a) the associated serialization bottleneck negatively affects performance and scalability (*e.g.* see (Ellis and Gibbs 1989)), and (b) the cap theorem imposes a tradeoff between consistency and partition-tolerance (Gilbert and Lynch 2002).

In systems based on *optimistic replication* (Vogels 2009; Saito and Shapiro 2005), a replica may execute an operation without synchronizing with other replicas. If the operation is a mutator, the other replicas are updated asynchronously. Due to the vagaries of the network, the replicas could receive and apply the updates in possibly different orders.

For sequential systems, the correctness problem is typically divided into two tasks: proving *termination* and proving *partial correctness*. Termination requires that the program eventually halt on all inputs, whereas partial correctness requires that the program only returns results that are allowed by the specification.

For replicated systems, the analogous goals are *convergence* and *validity*. Convergence requires that all replicas eventually agree. Validity requires that they agree on something sensible. In a replicated list, for example, if the only value put into the list is 1, then convergence ensures that all replicas eventually see the same value for the head of the list; validity requires that the value be 1.

Convergence has been well-understood since the earliest work on replicated systems. Convergence is typically defined as *eventual consistency*, which requires that once all messages are delivered, all replicas have the same state. *Strong eventual consistency* (sec) additionally requires convergence for all subsets of messages: replicas that have seen the same messages must have the same state.

Perhaps surprisingly, finding an appropriate definition of validity for replicated systems remains an open problem. There are solutions which use concurrent specifications, discussed below. But, as Shavit (2011) noted:

"It is infinitely easier and more intuitive for us humans to specify how abstract data structures behave in a sequential setting, where there are no interleavings. Thus, the standard approach to arguing the safety properties of a concurrent data structure is to specify the structure's properties sequentially, and find a way to map its concurrent executions to these 'correct' sequential ones."

In this paper we give the first definition of validity that is both (1) derived from standard sequential specifications and (2) validates the examples of interest.

We take the "examples of interest" to be *Convergent/Commutative Replicated Data Types* (crdts). These are replicated structures that obey certain monotonicity or commutativity properties. As an example of a crdt, consider the *add-wins set*, also called an "observed remove" set in Shapiro et al. (2011a). The add-wins set behaves like a sequential set if add and remove operations on the same element are ordered. The concurrent execution of an add and remove result in the element being added to the set; thus the remove is ignored and the "add wins." This concurrent specification is very simple, but as we will see in the next section, it is quite difficult to pin down the relationship between the crdt and the sequential specification used in the crdt's definition. This paper is the first to successfully capture this relationship.

Many replicated data types are crdts, but not all (Shapiro et al. 2011a). Notably, Amazon's Dynamo (DeCandia et al. 2007) is not a crdt. Indeed, interest in crdts is motivated by a desire to avoid the well-know concurrency anomalies suffered by Dynamo and other ad hoc systems (Bieniusa et al. 2012).

Shapiro et al. (2011b) introduced the notion of crdt and proved that every crdt has an sec implementation. Their definition of sec includes convergence, but not validity.

The validity requirement can be broken into two components. We describe these below using the example of a list data type that supports only two operations: the mutator put, which adds an element to the end of the list, and the query q, which returns the state of the list. This structure can be specified as a set of strings such as "put(1); put(3); q=[1,3]" and "put(1); put(2); put(3); q=[1,2,3]".


Burckhardt et al. (2012) provide a formal definition of validity using partial orders over events: linearizations respect the partial order on events; monotonicity is ensured by requiring that evolution extends the partial order. Similar definitions can be found in Jagadeesan and Riely (2015) and Perrin et al. (2015). Replicated data structures that are sound with respect to this definition enjoy many good properties, which we discuss throughout this paper. However, this notion of correctness is not general enough to capture common crdts, such as the add-wins set.

This lack of expressivity lead Burckhardt et al. (2014) to abandon notions of validity that appeal directly to a sequential specification. Instead they work directly with *concurrent* specifications, formalizing the style of specification found informally in Shapiro et al. (2011b). This has been a fruitful line of work, leading to proof rules (Gotsman et al. 2016) and extensions (Bouajjani et al. 2014). See (Burckhardt 2014; Viotti and Vukolic 2016) for a detailed treatment.

Positively, concurrent specifications can be used to validate any replicated structure, including crdts as well as anomalous structures such as Dynamo. Negatively, concurrent specifications have no the clear connection to their sequential counterparts. In this paper, we restore this connection. We arrive at a definition of sec that admits crdts, but rejects Dynamo.

The following "corner cases" are a useful sanity-check for any proposed notion of validity.


psts and psm say that a replicated structure should behave sequentially when replication is not used. ppe says that the order of independent operations should not matter. Our definition implies all three conditions. Dynamo fails ppe (Bieniusa et al. 2012), and thus fails to pass our definition of sec.

In the next section, we describe the validity problem and our solution in detail, using the example of a binary set. The formal definitions follow in Sect. 3. We state some consequences of the definition and prove that the add-wins set satisfies our definition. In Sect. 4, we describe a collaborative text editor and prove that it is sec. In Sect. 5 we characterize the programmer's view of a crdt by defining the *most general* crdt that satisfies a given sequential specification. We show that any program that is correct using the most general crdt will be correct using a more restricted crdt. We also show that our validity criterion for sec is *local* in the sense of Herlihy and Wing (1990): independent structures can be verified independently. In Sect. 6, we apply these results to prove the correctness of a graph that is implemented using two sec sets.

Our work is inspired by the study of relaxed memory, such as (Alglave 2012). In particular, we have drawn insight from the rmo model of Higham and Kawash (2000).

### 2 Understanding Replicated Sets

In this section, we motivate the definition of sec using replicated sets as an example. The final definition is quite simple, but requires a fresh view of both executions and specifications. We develop the definition in stages, each of which requires a subtle shift in perspective. Each subsection begins with an example and ends with a summary.

#### 2.1 Mutators and Non-mutators

An *implementation* is a set of *executions*. We model executions abstractly as labelled partial orders (lpos). The ordering of the lpo captures the history that precedes an event, which we refer to as *visibility*.

$$
\begin{array}{c}
\boxed{\underbrace{\cdots}}^a \xrightarrow{a} \boxed{\underbrace{\cdots}}^b \xrightarrow{b} \boxed{\underbrace{\cdots}}^c \xrightarrow{c} \boxed{\underbrace{\cdots}}^d \cdots \\
\boxed{\underbrace{\cdots}}^a
\end{array}
\tag{1}
$$

$$
\begin{array}{c}
\boxed{\underbrace{\cdots}}^a \xrightarrow{a} \boxed{\underbrace{\cdots}}^b \xrightarrow{b} \boxed{\underbrace{\cdots}}^a
\end{array}
\tag{1}
$$

Here the events are a through j, with labels +0, +1, etc., and order represented by arrows. The lpo describes an execution with two replicas, shown horizontally, with time passing from left to right. Initially, the top replica receives a request to add 0 to the set (+0<sup>a</sup>). Concurrently, the bottom replica receives a request to add 1 (+1<sup>b</sup>). Then each replica is twice asked to report on the items contained in the set. At first, the top replica replies that 0 is present and 1 is absent (✓0<sup>b</sup>✗1<sup>c</sup>), whereas the bottom replica answers with the reverse (✗0<sup>g</sup>✓1<sup>h</sup>). Once the add operations are visible at all replicas, however, the replicas give the same responses (✓0<sup>d</sup>✓1<sup>e</sup> and ✓0<sup>i</sup> ✓1<sup>j</sup> ).

lpos with non-interacting replicas can be denoted compactly using sequential and parallel composition. For example, the prefix of (1) that only includes the first three events at each replica can be written (+0<sup>a</sup>; ✓0<sup>b</sup>; ✗1<sup>c</sup>) -(+1<sup>f</sup> ; ✗0<sup>g</sup>; ✓1<sup>h</sup>).

A *specification* is a set of *strings*. Let set be the specification of a sequential set with elements 0 and 1. Then we expect that set includes the string "+0✓0✗1", but not "+0✗0✓1". Indeed, each specification string can uniquely be extended with either ✓0 or ✗0 and either ✓1 or ✗1.

There is an isomorphism between strings and labelled *total* orders. Thus, specification strings correspond to the restricted class of lpos where the visibility relation provides a total order.

*Linearizability* (Herlihy and Wing 1990) is the gold standard for concurrent correctness in tightly coupled systems. Under linearizability, an execution is valid if there exists a linearization τ of the events in the execution such that for every event e, the prefix of e in τ is a valid specification string.

Execution (1) is not linearizable. The failure can already be seen in the sublpo (+0<sup>a</sup>; ✗1<sup>c</sup>) - (+1<sup>f</sup> ; ✗0<sup>g</sup>). Any linearization must have either +1<sup>f</sup> before ✗1<sup>c</sup> or +0<sup>a</sup> before ✗0g. In either case, the linearization is invalid for set.

Although it is not linearizable, execution (1) is admitted by every crdt set in Shapiro et al. (2011a). To validate such examples, Burckhardt et al. (2012) develop a weaker notion of validity by dividing labels into *mutators* and *accessors* (also known as non-mutators). Similar definitions appear in Jagadeesan and Riely (2015) and Perrin et al. (2015). Mutators change the state of a replica, and accessors report on the state without changing it. For set, the mutators **M** and non-mutators **M** are as follows.

**<sup>M</sup>** <sup>=</sup> {+0, -0, +1, -1}, representing addition and removal of bits <sup>0</sup> and <sup>1</sup>. **<sup>M</sup>** <sup>=</sup> {✗0, ✓0, ✗1, ✓1}, representing membership tests returning false or true.

Define the *mutator prefix* of an event e to include e and the *mutators* visible to e. An execution is valid if there exists a linearization of the execution, τ , such that for every event e, the *mutator prefix* of e in τ is a valid specification string.

It is straightforward to see that execution (1) satisfies this weaker criterion. For both ✓0<sup>b</sup> and ✗1<sup>c</sup>, the mutator prefix is +0<sup>a</sup>. This includes +0<sup>a</sup> but not +1<sup>f</sup> , and thus their answers are validated. Symmetrically, the mutator prefixes of ✗0<sup>g</sup> and ✓1<sup>h</sup> only include +1<sup>f</sup> . The mutator prefixes for the final four events include both +0<sup>a</sup> and +1<sup>f</sup> , but none of the prior accessors.

*Summary:* Convergent states must agree on the final order of mutators, but intermediate states may see incompatible subsequences of this order. By restricting attention to mutator prefixes, the later states need not linearize these incompatible views of the partial past.

This relaxation is analogous to the treatment of non-mutators in update serializability (Hansdah and Patnaik 1986; Garcia-Molina and Wiederhold 1982), which requires a global serialization order for mutators, ignoring non-mutators.

#### 2.2 Dependency

The following lpo is admitted by the add-wins set discussed in the introduction.

$$
\begin{array}{c}
\square \square^{\scriptstyle{a}} \square^{\scriptstyle{b}} \square^{\scriptstyle{c}} \square^{\scriptstyle{c}} \\
\square^{\scriptstyle{a}} \square^{\scriptstyle{b}} \square^{\scriptstyle{c}}
\end{array}
\begin{array}{c}
\square \square^{\scriptstyle{a}} \\
\square^{\scriptstyle{a}} \\
\square^{\scriptstyle{b}}
\end{array}
\begin{array}{c}
\square \square^{\scriptstyle{a}} \\
\square^{\scriptstyle{a}} \\
\square^{\scriptstyle{b}}
\end{array}
\begin{array}{c}
\square \square^{\scriptstyle{a}} \\
\square^{\scriptstyle{a}} \\
\square^{\scriptstyle{b}}
\end{array}
\tag{2}
$$

In any crdt implementation, the effect of +1<sup>b</sup> is negated by the subsequent -1<sup>c</sup> The same reasoning holds for +0<sup>e</sup> and -0<sup>f</sup> . In an add-wins set, however, the *concurrent* adds, +0<sup>a</sup> and +1<sup>d</sup>, win over the deletions. Thus, in the final state both 0 and 1 are present.

This lpo is not valid under the definition of the previous subsection: Since ✓0<sup>g</sup> and ✓1<sup>h</sup> see the same mutators, they must agree on a linearization of (+0<sup>a</sup>; +1<sup>b</sup>; -1<sup>c</sup>) - (+1<sup>d</sup>; +0<sup>e</sup>; -0<sup>f</sup> ). Any linearization must end in either -1<sup>c</sup> or -0<sup>f</sup> ; thus it is not possible for both ✓0<sup>g</sup> and ✓1<sup>h</sup> to be valid.

Similar issues arise in relaxed memory models, where program order is often relaxed between uses of independent variables (Alglave et al. 2014). Generalizing, we write m # n to indicate that labels m and n are dependent. Dependency is a property of a *specification*, not an implementation. Our results only apply to specifications that support a suitable notion of dependency, as detailed in Sect. 3. For set, # is an equivalence relation with two equivalence classes, corresponding to actions on the independent values 0 and 1.

# = {+0, -0, ✗0, ✓0}<sup>2</sup> ∪ {+1, -1, ✗1, ✓1}<sup>2</sup>, where <sup>D</sup><sup>2</sup> <sup>=</sup> <sup>D</sup> <sup>×</sup> D.

While the dependency relation for set is an equivalence, this is not required: In Sect. 4 we establish the correctness of collaborative text editing protocol with an intransitive dependency relation.

The *dependent restriction* of (2) is as follows.

$$
\overbrace{\underbrace{\text{-0}}^{\text{-}}}^{\text{-}}\underbrace{\text{-1}}^{\text{-}}\underbrace{\text{-1}}^{\text{-}}\underbrace{\text{-1}}^{\text{-}}\overbrace{\text{-2}}^{\text{-}}\overbrace{\text{-3}}^{\text{-}}\overbrace{\text{-4}}^{\text{-}}\overbrace{\text{-6}}^{\text{-}}\tag{3}
$$

In the previous subsection, we defined validity using the *mutator prefix* of an event. We arrive at a weaker definition by restricting attention to the *mutator prefix of the dependent restriction*.

Under this definition, execution (2) is validated: Any interleaving of the strings +0e-0<sup>f</sup> +0<sup>a</sup>✓0<sup>g</sup> and +1b-1c+1<sup>d</sup>✓1<sup>h</sup> linearizes the dependent restriction of (2) given in (3).

*Summary:* crdts allow independent mutators to commute. We formalize this intuition by restricting attention to mutator prefixes of the dependent restriction. The crdt must respect program order between dependent operations, but is free to reorder independent operations.

This relaxation is analogous to the distinction between *program order* and *preserved* program order (ppo) in relaxed memory models (Higham and Kawash 2000; Alglave 2012). Informally, ppo is the suborder of program order that removes order between independent memory actions, such as successive reads on different locations without an intervening memory barrier.

#### 2.3 Puns

The following lpo is admitted by the add-wins set.

As in execution (2), the add +0<sup>a</sup> is undone by the following remove -0<sup>b</sup>, but the concurrent add +0<sup>e</sup> wins over -0<sup>b</sup>, allowing ✓0<sup>c</sup>. In effect, ✓0<sup>c</sup> sees the order of the mutators as +0<sup>a</sup> -0<sup>b</sup> +0<sup>e</sup>. Symmetrically, ✓0<sup>g</sup> sees the order as +0<sup>e</sup> -0<sup>f</sup> +0<sup>a</sup>. While this is very natural from the viewpoint of a crdt, there is no linearization of the events that includes both +0<sup>a</sup> -0<sup>b</sup> +0<sup>e</sup> and +0<sup>e</sup> -0<sup>f</sup> +0a, since +0<sup>a</sup> and +0<sup>e</sup> must appear in different orders.

Indeed, this lpo is not valid under the definition of the previous subsection. First note that all events are mutually dependent. To prove validity we must find a linearization that satisfies the given requirements. Any linearization of the mutators must end in either -0<sup>b</sup> or -0<sup>f</sup> . Suppose we choose +0<sup>a</sup> -0<sup>b</sup> +0<sup>e</sup> -0<sup>f</sup> and look for a mutator prefix to satisfy ✓0g. (All other choices lead to similar problems.) Since -0<sup>f</sup> precedes ✓0<sup>g</sup> and is the last mutator in our chosen linearization, every possible witness for ✓0<sup>g</sup> must end with mutator -0<sup>f</sup> . Indeed the only possible witness is +0<sup>a</sup> +0<sup>e</sup> -0<sup>f</sup> ✓0<sup>g</sup>. However, this is not a valid specification string.

The problem is that we are linearizing *events*, rather than *labels*. If we shift to linearizing labels, then execution (4) is allowed. Fix the final order for the mutators to be +0 -0 +0 -0. The execution is allowed if we can find a subsequence that linearizes the labels visible at each event. It suffices to choose the witnesses as follows. In the table, we group events with a common linearization together.

$$\begin{array}{cccc} \text{+0}^a, & \text{+0}^e; & \text{+0} & & \text{X} \otimes^c, \ \mho^g & \text{5} & \text{0} & \text{0} & \text{0} \\ \text{-0}^b, & \text{-0}^f; & \text{+0}-\text{0} & & \text{X} \otimes^d, \ \mho^h & \text{+0}-\text{0}+\text{0} & \text{0} & \text{0} & \text{0} \\ \end{array}$$

Each of these is a valid specification string. In addition, looking only at mutators, each is a subsequence of +0 -0 +0 -0.

In execution (4), each of the witnesses is actually a *prefix* of the final mutator order, but, in general, it is necessary to allow *subsequences*.

$$\underbrace{\square^{\circ}}\_{\square^{\circ}} \xleftarrow{\square^{\circ}}\_{\square^{\circ}} \xleftarrow{\square^{\circ}}\_{\square} \xleftarrow{\square^{\circ}}\_{\square} \tag{5}$$

Execution (5) is admitted by the add-wins set. It is validated by the final mutator sequence -0 +0. The mutator prefix +0 of b is a subsequence of -0 +0, but not a prefix.

*Summary:* While dependent events at a single replica must be linearized in order, concurrent events may slip anywhere into the linearization. A crdt may *pun* on concurrent events with same label, using them in different positions at different replicas. Thus a crdt may establish a final total over the labels of an execution even when there is no linearization of the events.

#### 2.4 Frontiers

In the introduction, we mentioned that the validity problem can be decomposed into the separate concerns of *linearizability* and *monotonicity.* The discussion thus far has centered on the appropriate meaning of linearizability for crdts. In this subsection and the next, we look at the constraints imposed by monotonicity.

Consider the prefix {+0a, -0b, +0e, ✓0c, -0<sup>f</sup> } of execution (4), extended with action ✗0x, with visibility order as follows.

This execution is *not strong* ec, since ✓0<sup>c</sup> and ✗0<sup>x</sup> see exactly the same mutators, yet provide incompatible answers.

Unfortunately, execution (6) is valid by the definition given in the previous section: The witnesses for a–f are as before. In particular, the witness for ✓0<sup>c</sup> is "+0-0+0✓0". The witness for ✗0<sup>x</sup> is "+0+0-0✗0". In each case, the mutator prefix is a subsequence of the global mutator order "+0-0+0-0".

It is well known that punning can lead to bad jokes. In this case, the problem is that ✗0<sup>x</sup> is punning on a concurrent -0 that cannot be matched by a visible -0 in its history: the execution -0 that is visible to ✗0<sup>x</sup> must appear *between* the two +0 operations; the specification -0 that is used by ✗0<sup>x</sup> must appear *after*. The final states of execution (4) have seen both remove operations, therefore the pun is harmless there. But ✓0<sup>c</sup> and ✗0<sup>x</sup> have seen only one remove. They must agree on how it is used.

Up to now, we have discussed the linearization of each event in isolation. We must also consider the relationship between these linearizations. When working with linearizations of *events*, it is sufficient to require that the linearization chosen for each event be a subsequence for the linearization chosen for each visible predecessor; since events are unique, there can be no confusion in the linearization about which event is which. Execution (6) shows that when working with linearizations of *labels*, it is insufficient to consider the relationship between individual events. The linearization "+0+0-0✗0" chosen for ✗0<sup>x</sup> is a supersequence of those chosen for its predecessors: "+0" for +0<sup>e</sup> and "+0-0" for -0<sup>b</sup>. The linearization "+0-0+0✓0" chosen for ✓0<sup>c</sup> is also a supersequence for the same predecessors. And yet, ✓0<sup>c</sup> and ✗0<sup>x</sup> are incompatible states.

Sequential systems have a single state, which evolves over time. In distributed systems, each replica has its own state, and it is this *set* of states that evolves. Such a set of states is called a *(consistent) cut* (Chandy and Lamport 1985).

A *cut* of an lpo is a sub-lpo that is down-closed with respect to visibility. The *frontier* of cut is the set of maximal elements. For example, there are 14 frontiers of execution (6): the singletons {+0<sup>a</sup>}, {-0<sup>b</sup>}, {✓0<sup>c</sup>}, {+0<sup>e</sup>}, {-0<sup>f</sup> }, {✗0<sup>x</sup>}, the pairs {+0a, +0<sup>e</sup>}, {+0a, -0<sup>f</sup> }, {-0b, +0<sup>e</sup>}, {-0b, -0<sup>f</sup> }, {✓0c, -0<sup>f</sup> }, {✓0c, ✗0<sup>x</sup>}, {✗0x, -0<sup>f</sup> }, and the triple {✓0c, ✗0x, -0<sup>f</sup> }. As we explain below, we consider non-mutators in isolation. Thus we do not consider the last four cuts, which include a non-mutator with other events. That leaves 10 frontiers. The definition of the previous section only considered the 6 singletons. Singleton frontiers are generated by *pointed cuts*, with a single maximal element.

When applied to frontiers, the monotonicity requirement invalidates execution (6). Monotonicity requires that the linearization chosen for a frontier be a subsequence of the linearization chosen for any extension of that frontier. If we are to satisfy state ✓0<sup>c</sup> in execution (6), the frontier {-0b, +0e} must linearize to "+0-0+0". If we are to satisfy state ✗0x, the frontier {-0b, +0e} must linearize to "+0+0-0". Since we require a unique linearization for each frontier, the execution is disallowed.

Since crdts execute non-mutators locally, it is important that we ignore frontiers with multiple non-mutators. Recall execution (4):

There is no specification string that linearizes the cut with frontier {✓0c, ✓0<sup>g</sup>}, since we cannot have ✓0 immediately after -0. If we consider only pointed cuts for non-mutators, then the execution is sec, with witnesses as follows.


In order to validate non-mutators, we *must* consider singleton non-mutator frontiers. The example shows that we *must not* consider frontiers with multiple non-mutators. There is some freedom in the choices otherwise. For set, we can "saturate" an execution with accessors by augmenting the execution with accessors that witness each cut of the mutators. In a saturated execution, it is sufficient to consider only the *pointed accessor* cuts, which end in a maximal accessor. For non-saturated executions, we are forced to examine each mutator cut: it is possible that a future accessor extension may witness that cut. The status of "mixed" frontiers, which include mutators with a single maximal nonmutator, is open for debate. We choose to ignore them, but the definition does not change if they are included.

*Summary:* A crdt must have a strategy for linearizing all mutator labels, even in the face of partitions. In order to ensure *strong* ec, the definition must consider sets of events across multiple replicas. Because non-mutators are resolved locally, sec must ignore frontiers with multiple non-mutators.

Cuts and frontiers are well-known concepts in the literature of distributed systems (Chandy and Lamport 1985). It is natural to consider frontiers when discussing the evolving correctness of a crdt.

#### 2.5 Stuttering

Consider the following execution.

This lpo represents a partitioned system with events a–e in one partition and x–z in the other. As the partition heals, we must be able to account for the intermediate states. Because of the large number of events in this example, we have elided all accessors. We will present the example using the semantics of the add-wins set. Recall that the add-wins set validates ✓0 if and only if there is a maximal +0 beforehand. Thus, a replica that has seen the cut with frontier {+0a, -0y, -0z} must answer ✓0, whereas a replica that has seen {-0b, -0y, -0z} must answer ✗0.

Any linearization of {+0a, -0y, -0<sup>z</sup>} must end in +0, since the add-win set must reply ✓0: the only possibility is "+0-0-0+0". The linearization of {-0b, -0y, -0<sup>z</sup>} must end in -0. If it must be a supersequence, the only possibility is "+0-0-0+0-0". Taking one more step on the left, {+0c, -0y, -0<sup>z</sup>} must linearize to "+0-0-0+0-0+0". Thus the final state {-0d, -0e, -0y, -0<sup>z</sup>} must linearize to "+0 -0-0+0-0+0-0-0". Reasoning symmetrically, the linearization of {-0d, -0e, +0<sup>x</sup>} must be "+0-0+0-0-0+0", and thus the final {-0d, -0e, -0y, -0<sup>z</sup>} must linearize to "+0-0+0-0-0+0-0-0". The constraints on the final state are incompatible. Each of these states can be verified in isolation; it is the relation between them that is not satisfiable.

Recall that monotonicity requires that the linearization chosen for a frontier be a *subsequence* of the linearization chosen for any extension of that frontier. The difficulty here is that subsequence relation ignores the similarity between "+0 -0-0+0-0+0-0-0" and "+0-0+0-0-0+0-0-0". Neither of these is a subsequence of the other, yet they capture exactly the same sequence of *states*, each with six alternations between ✗0 and ✓0. The canonical state-based representative for these sequences is "+0-0+0-0+0-0".

crdts are defined in terms of states. In order to relate crdts to sequential specifications, it is necessary to extract information about states from the specification itself. Adapting Brookes (1996), we define strings as *stuttering equivalent* (notation <sup>σ</sup> <sup>∼</sup> <sup>τ</sup> ) if they pass through the same states. So +0+1+0 <sup>∼</sup> +0+1 but +0-0+0 ∼ +0. If we consider subsequences up to stuttering, then execution (7) is sec, with witnesses as follow:

{a}, {x}, {a, x} : +0 {b}, {y}, {y, z}, {z} : +0-0 {a, y}, {a, y, z}, {a, z}, {b, x} : +0-0+0 {b, y}, {b, y, z}, {b, z}, {d}, {d, e}, {e} : +0-0+0-0 {c, y}, {c, y, z}, {c, z}, {d, x}, {d, e, x}, {e, x} : +0-0+0-0+0 {d, y}, {d, y, z}, {d, z}, {e, y}, {e, y, z}, {e, z}, {d, e, y}, {d, e, y, z}, {d, e, z}: +0-0+0-0+0-0

Recall that without stuttering, we deduced that {+0c, -0y, -0<sup>z</sup>} must linearize to "+0-0-0+0-0+0" and {-0d, -0e, +0<sup>x</sup>} must linearize to "+0-0+0-0-0+0". Under stuttering equivalence, these are the same, with canonical representative "+0 -0+0-0+0". Thus, monotonicity under stuttering allows both linearizations to be extended to satisfy the final state {-0d, -0e, -0y, -0<sup>z</sup>}, which has canonical representative "+0-0+0-0+0-0".

*Summary:* crdts are described in terms of convergent states, whereas specifications are described as strings of actions. Actions correspond to labels in the lpo of an execution. Many strings of actions may lead to equivalent states. For example, idempotent actions can be applied repeatedly without modifying the state.

The stuttering equivalence of Brookes (1996) addresses this mismatch. In order to capture the validity of crdts, the definition of subsequence must change from a definition over individual specification strings to a definition over *equivalence classes* of strings *up to stuttering*.

### 3 Eventual Consistency for CRDTs

This section formalizes the intuitions developed in Sect. 2. We define executions, specifications and strong eventual consistency (sec). We discuss properties of eventual consistency and prove that the add-wins set is sec.

### 3.1 Executions

An execution realizes *causal delivery* if, whenever an event is received at a replica, all predecessors of the event are also received. Most of the crdts in Shapiro et al. (2011a) assume causal delivery, and we assumed it throughout the introductory section. There are costs to maintaining causality, however, and not all crdts assume that executions incur these costs. In the formal development, we allow non-causal executions.

Shapiro et al. (2011a) draw executions as timelines, explicitly showing the delivery of remote mutators. Below left, we give an example of such a timeline.

This is a non-causal execution: at the bottom replica, +1 is received before +0, even though +0 precedes +1 at the top replica.

Causal executions are naturally described as Labelled Partial Orders (lpos), which are transitive and antisymmetric. Section 2 presented several examples of lpos. To capture non-causal systems, we move to *Labelled Visibility Orders* (lvos), which are merely acyclic. Acyclicity ensures that the transitive closure of an lvo is an lpo. The right picture above shows the lvo corresponding to the timeline on the left. The zigzag arrow represents an intransitive communication. When drawing executions, we use straight lines for "transitive" edges, with the intuitive reading that "this and all preceding actions are delivered".

lvos arise directly due to non-causal implementations. As we will see in Sect. 4, they also arise via projection from an lpo.

lvos are unusual in the literature. To make this paper self-contained, we define the obvious generalizations of concepts familiar from lpos, including isomorphism, suborder, restriction, maximality, downclosure and cut.

Fix a set **L** of labels. A *Labelled Visibility Order* (lvo, also known as an *execution*) is a triple <sup>u</sup> <sup>=</sup> Eu, λu, <sup>u</sup> where <sup>E</sup><sup>u</sup> is a finite set of events, <sup>λ</sup><sup>u</sup> <sup>∈</sup> (E<sup>u</sup> → **L**) and <sup>u</sup> ⊆ (E<sup>u</sup> × Eu) is reflexive and acyclic.

Let u, v range over lvos. Many concepts extend smoothly from lpos to lvos.


*Replica-Specific Properties.* In the literature on replicated data types, some properties of interest (such as "read your writes" (Tanenbaum and Steen 2007)) require the concept of "session" or a distinction between local and remote events. These can be accommodated by augmenting lvos with a replica labelling <sup>ρ</sup><sup>u</sup> <sup>∈</sup> (E<sup>u</sup> <sup>→</sup> **<sup>R</sup>**), which maps events to a set **<sup>R</sup>** of *replica identifiers*.

Executions can be generated operationally as follows: Replicas receive mutator and accessor events from the local client; they also receive mutator events that are forwarded from other replicas. Each replica maintains a set of *seen* events: an event that is received is added to this set. When an event is received from the local client, the event is additionally added to the execution, with the predecessors in the visibility relation corresponding to the current *seen* set. If we wish to restrict attention to causal executions, then we require that replicas forward all the mutators in their *seen* sets, rather than individual events, and, thus, the visibility relation is transitive over mutators.

All executions that are operationally generated satisfy the additional property that <sup>u</sup> is per-replica total: if ρ(d) = ρ(e) then either d <sup>u</sup> e or e <sup>u</sup> d.

<sup>1</sup> We use the standard definitions for restriction on functions and relations. Given a function f : E → X, R: E × E and D ⊆ E, define f D = {d, f(d) | d ∈ D} and R D = {d1, d2 | d1, d<sup>2</sup> ∈ D and d<sup>1</sup> R d2}.

We do not demand per-replica totality because our results do not rely on replicaspecific information.

### 3.2 Specifications and Stuttering Equivalence

Specifications are sets of strings, equipped with a distinguished set of mutators and a dependency relation between labels. Specifications are subject to some constraints to ensure that the mutator set and dependency relations are sensible; these are inspired by the conditions on Mazurkiewicz executions (Diekert and Rozenberg 1995). Every specification set yields a derived notion of stuttering equivalence. This leads to the definition of *observational subsequence* (≤obs ).

We use standard notation for strings: Let σ and τ range over strings. Then στ denotes concatenation, σ<sup>∗</sup> denotes Kleene star, σ τ denotes the set of interleavings, ε denotes the empty string and σ<sup>i</sup> denotes the i th element of σ. These notations lift to sets of strings via set union.

<sup>A</sup> *specification* is a quadruple **L**, **<sup>M</sup>**, #, Σ where


Let **M** = **L** \ **M** be the sets of *non-mutators*.

A specification must satisfy the following properties:


Property (b) ensures that non-mutators do not affect the state of the data structure. Property (c) ensures that commuting of independent actions does not affect the state of the data structure.

Recall that the set specification takes **<sup>M</sup>** <sup>=</sup> {+0, -0, +1, -1}, representing addition and removal of bits <sup>0</sup> and <sup>1</sup>, and **<sup>M</sup>** <sup>=</sup> {✗0, ✓0, ✗1, ✓1}, representing membership tests returning false or true. The dependency relation is # = {+0, -0, ✗0, ✓0}<sup>2</sup> ∪ {+1, -1, ✗1, ✓1}<sup>2</sup>, where <sup>D</sup><sup>2</sup> <sup>=</sup> <sup>D</sup> <sup>×</sup> <sup>D</sup>.

The dependency relation for set is an equivalence, but this need not hold generally. We will see an example in Sect. 4.

The definitions in the rest of the paper assume that we have fixed a specification **L**, **<sup>M</sup>**, #, Σ. In the examples of this section, we use set.

*State and Stuttering Equivalence.* Specification strings σ and τ are *state equivalence* (notation <sup>σ</sup> <sup>≈</sup> <sup>τ</sup> ) if every valid extension of <sup>σ</sup> is also a valid extension of <sup>τ</sup> , and vice versa. For example, +0+1+0 <sup>≈</sup> +0+1 and +0-0+0 <sup>≈</sup> +0, but +0-0 ≈ +0. In particular, state equivalent strings agree on the valid accessors that can immediately follow them: either ✓0 or ✗0 and either ✓1 or ✗1. Formally, we define state equivalence, ≈ ⊆ **<sup>L</sup>**<sup>∗</sup> <sup>×</sup> **<sup>L</sup>**∗, as follows<sup>2</sup>.

$$(\sigma \approx \sigma') \stackrel{\scriptstyle \circ}{=} (\sigma = \sigma') \text{ or } (\{\sigma, \sigma'\} \subseteq \Sigma \text{ and } \forall \tau \in \mathbf{L}^\*. \,\sigma\tau \in \Sigma \text{ iff } \,\sigma'\tau \in \Sigma).$$

From specification property (b), we know that non-mutators do not affect the state. Thus we have that ua <sup>≈</sup> <sup>u</sup> whenever <sup>a</sup> <sup>∈</sup> **<sup>M</sup>** and ua <sup>∈</sup> <sup>Σ</sup>. From specification property (c), we know that independent actions commute. Thus we have that σab <sup>≈</sup> σba whenever <sup>¬</sup>(<sup>a</sup> # <sup>b</sup>) and {σab, σba} ⊆ <sup>Σ</sup>.

Two strings are *stuttering equivalent*<sup>3</sup> if they only differ in operations that have no effect on the state of the data structure, as given by Σ. Adapting Brookes (1996) to our notion of state equivalence, we define stuttering equivalence, ∼ ⊆ **L**<sup>∗</sup> × **L**∗, to be the least equivalence relation generated by the following rules, where a ranges over **L**.

$$\begin{array}{ccc} \hline \sigma \sim \varepsilon & \quad \frac{\sigma \sim \sigma'}{\sigma a \sim \sigma' a} & \quad \frac{\sigma \approx \sigma a}{\sigma \sim \sigma a} & \quad \frac{\sigma b \sim \sigma \quad \neg(a \not\neq b)}{\sigma ab \sim \sigma a} \\\hline \end{array}$$

The first rule above handles the empty string. The second rule allows stuttering in any context. The third rule motivates the name stuttering equivalence, for example, allowing +0+0 <sup>∼</sup> +0. The last case captures the equivalence generated by independent labels, for example, allowing +0+1+0 <sup>∼</sup> +0+1 but not +0-0+0 <sup>∼</sup> +0-0. Using the properties of <sup>≈</sup> discussed above, we can conclude, for example, that +0✓0✓0+0-0✗<sup>0</sup> <sup>∼</sup> +0-0.

Consider specification strings for a unary set over value 0. Since stuttering equivalence allows us to remove both accessors and adjacent mutators with the same label we deduce that the *canonical representatives* of the equivalence classes induced by <sup>∼</sup> are generated by the regular expression (+0)?(-0+0)∗(-0)?.

*Observational Subsequence.* Recall that ac is a *subsequence* of abc, although it is not a *prefix*. We write ≤seq for subsequence and ≤obs for *observational subsequence*, defined as follows.

$$
\sigma\_1 \cdots \sigma\_n \leq\_{\mathsf{seq}} \tau\_0 \sigma\_1 \tau\_1 \cdots \sigma\_n \tau\_n \qquad \sigma \leq\_{\mathsf{obs}} \tau \quad \text{if} \ \exists \sigma' \sim \sigma. \ \exists \tau' \sim \tau. \ \sigma' \leq\_{\mathsf{seq}} \tau'
$$

Note that observational subsequence includes both subsequence and stuttering equivalence (≤obs ) ⊆ (≤seq ) ∪ (∼).

≤seq can be understood in isolation, whereas ≤obs can only be understood with respect to a given specification. In the remainder of the paper, the implied specification will be clear from context. ≤seq is a partial order, whereas ≤obs is only a preorder, since it is not antisymmetric.

Let σ and τ be strings over the unary set with canonical representatives aσ and bτ . Then we have that <sup>σ</sup> <sup>≤</sup>obs <sup>τ</sup> exactly when either <sup>a</sup> <sup>=</sup> <sup>b</sup> and - σ - - <sup>≤</sup> - τ - -

<sup>2</sup> To extend the definition to non-specification strings, we allow <sup>σ</sup> <sup>≈</sup> <sup>σ</sup> when σ = σ- .

<sup>3</sup> Readers of Brookes (1996) should note that mumbling is not relevant here, since all mutators are visible.

or <sup>a</sup> <sup>=</sup> <sup>b</sup> and - σ - - < - τ - -. Thus, observational subsequence order is determined by the number of alternations between the mutators.

Specification strings for the binary set, then, are stuttering equivalent exactly when they yield the same canonical representatives when restricted to 0 and to 1. Thus, observational subsequence order is determined by the number of alternations between the mutators, when restricted to each dependent subsequence. (The final rule in the definition of stuttering, which allows stuttering across independent labels, is crucial to establishing this canonical form.)

### 3.3 Eventual Consistency

Eventual consistency is defined using the *cuts* of an execution and the *observational subsequence order* of the specification. As noted in Sects. 2.2 and 2.4, it is important that we not consider all cuts. Thus, before we define sec, we must define *dependent cuts*.

The *dependent restriction* of an execution is defined: <sup>v</sup> # = Ev, λv, # v, where d # <sup>v</sup> e when λv(d) # λv(e) and d <sup>v</sup> e. See Sect. 2.2 for an example of dependent restriction.

The *dependent cuts* of v are cuts of the dependent restriction. As discussed in Sect. 2.4, we only consider pointed cuts (with a single maximal element) for non-mutators. See Sect. 2.4 for an example.

$$\mathsf{cuts}\_{\#}(v) = \left\{ u \in \mathsf{cuts}(v \mid \#) \: \left| \: \forall e \in \mathsf{E}\_{u}. \text{ if } \lambda\_{u}(e) \in \overline{\mathbf{M}} \text{ then } \mathsf{max}(u) = \{e\} \right\} \right\}$$

An execution <sup>v</sup> is *Eventually Consistent* (sec) for specification **L**, **<sup>M</sup>**, #, <sup>Σ</sup> iff there exists a function <sup>τ</sup> : cuts#(v) <sup>→</sup> <sup>Σ</sup> that satisfies the following.

Linearization: <sup>∀</sup><sup>p</sup> <sup>∈</sup> cuts#(v). p linearizes to <sup>τ</sup> (p), and Monotonicity: <sup>∀</sup>p, q <sup>∈</sup> cuts#(v). p <sup>⊆</sup> <sup>q</sup> implies <sup>τ</sup> (p) <sup>≤</sup>obs <sup>τ</sup> (q).

A data structure implementation is sec if all of its executions are sec.

In Sect. 2, we gave several examples that are sec. See Sects. 2.4 and 2.5 for examples where τ is given explicitly. Section 2.4 also includes an example that is not sec.

The concerns raised in Sect. 2 are reflected in the definition.


– Monotonicity ensures that the system evolves in a sensible way: new order may be introduced, but old order cannot be forgotten. As discussed in Sect. 2.5, the preserved order is captured in the observational subsequence relation, which allows stuttering (Brookes 1996).

#### 3.4 Properties of Eventual Consistency

We discuss some basic properties of sec. For further analysis, see Sect. 5.

An important property of crdts is *prefix closure*: If an execution is valid, then every prefix of the execution should also be valid. Prefix closure follows immediately from the definition, since whenever u is a prefix of v we have that cuts#(u) <sup>⊆</sup> cuts#(v).

Prefix closure looks back in time. It is also possible to look forward: A system satisfies *eventual delivery* if every valid execution can be extended to a valid execution with a maximal element that sees every mutator. If one assumes that every specification string can be extended to a longer specification string by adding non-mutators, then eventual delivery is immediate.

The properties psts, psm and ppe are discussed in the introduction. An sec implementation must satisfies ppe since every dependent set of mutators is linearized: sec enforces the stronger property that there are no new intermediate states, even when executing all mutators in parallel. For causal systems, where <sup>u</sup> is transitive, psts and psm follow by observing that if there is a total order on the mutators of u then any linearization of u is a specification string.

Burckhardt (2014, Sect. 5) provides a taxonomy of correctness criteria for replicated data types. Our definition implies NoCircularCausality and CausalArbitration, but does not imply either ConsistentPrefix or CausalVisibility. For lpos, which model causal systems, our definition implies CausalVisibility. ReadMyWrites and MonotonicReads require a distinction between local and remote events. If one assumes the replica-specific constraints given in Sect. 3.1, then our definition satisfies these properties; without them, our definition is too abstract.

#### 3.5 Correctness of the Add-Wins Set

The add-wins set is defined to answer ✓k for a cut u exactly when

$$
\exists d \in u. \,\lambda\_u(d) = \text{\textbullet} \,\,\wedge \,\,\langle \exists e \in u. \,\,\lambda\_u(e) = \text{\textbullet} \,\,\wedge \,\, d \,\,\stackrel{\sim}{\sim} u \,\,e).
$$

It answers ✗k otherwise. The add-wins set is called the "observed-remove" set.

We show that any lpo that meets this specification is sec with respect to set. We restrict attention to lpos since causal delivery is assumed for the addwins set in (Shapiro et al. 2011a).

For set, the dependency relation is an equivalence. For an equivalence relation <sup>R</sup>, let **<sup>L</sup>**/R <sup>⊆</sup> <sup>2</sup>**<sup>L</sup>** denote the set of (disjoint) equivalence classes for <sup>R</sup>. For set, **<sup>L</sup>**/# = {{+0, -0, ✗0, ✓0}, {+1, -1, ✗1, ✓1}}. When dependency is an equivalence, then *every* interleaving of independent actions is valid if *any* interleaving is valid. Formally, we have the following, where denotes interleaving.

$$\forall D \in (\mathbf{L}/\#). \forall \sigma \in D^\*. \forall \tau \in (\mathbf{L} \nmid D)^\*. (\sigma \parallel \tau) \cap \Sigma \neq \emptyset \text{ implies } (\sigma \parallel \tau) \subseteq \Sigma^\*$$

Using the forthcoming composition result (Theorem 2), it suffices for us to address the case when u only involves operations on a single element, say 0. For any such lvo <sup>u</sup>, we choose a linearization <sup>τ</sup> (u) <sup>∈</sup> (-0|+0)<sup>∗</sup> that has a maximum number of alternations between -0 and +0. If there is a linearization that begins with -0, then we choose one of these. Below, we summarize some of the key properties of such a linearization.


The first property above ensures that the accessors are validated correctly, *i.e.*, 0 is deemed to be present iff there is an +0 that is not followed by any -0.

We are left with proving monotonicity, *i.e.*, if <sup>u</sup> <sup>⊆</sup> <sup>v</sup>, then <sup>τ</sup> (u) <sup>≤</sup>obs <sup>τ</sup> (v). Consider τ (u) = aσ and τ (v) = bρ.


### 4 A Collaborative Text Editing Protocol

In this section we consider a variant of the collaborative text editing protocol defined by Attiya et al. (2016). After stating the sequential specification, text, we sketch a correctness proof with respect to our definition of eventual consistency. This example is interesting formally: the dependency relation is not an equivalence, and therefore the dependent projection does not preserve transitivity. The generality of intransitive lvos is necessary to understand text, even assuming a causal implementation.

*Specification.* Let a, b range over *nodes*, which contain some text, a unique identifier, and perhaps other information. Labels have the following forms:


We demonstrate the correct answers to queries by example. Initially, the document is empty, whereas after initialization, the document contains a single node; thus the specification contains strings such as "?ε !c ?c", where ε represents the empty document. Nodes can be added either before or after other nodes; thus "!c +b<c +d>c" results in the document ?bcd. Nodes are always added adjacent to the target; thus, order matters in "!c +e>c +d>c" which results in ?cde rather than ?ced. Removal does what one expects; thus "!c +e>c +d>c -c" results in ?de.

Attiya et al. (2016) define the interface for text using integer indices as targets, rather than nodes. Using the unique correspondence between the nodes and it indices (since node are unique), one can easily adapt an implementation that satisfies our specification to their interface.

We say that node a is a *added* in the actions !a, +a<b and +a>b. Node b is a *target* in +a<b and +a>b. In addition to correctly answering queries, specifications must satisfy the following constraints:


These constraints forbid adding to a target that has been removed; thus "!c +d>c -c" is a valid string, but "!c -c +d>c" is not. It also follows that initialization must precede any other mutators.

Because add operations use unique identifiers, punning and stuttering play little role in this example. In order to show the implementation correct, we need only choose an appropriate notion of dependency. As we will see, it is necessary that removes be independent of adds with disjoint label sets, but otherwise all actions may be dependent. Let **L**!+? be the set of add and query labels, and let nodes return the set of nodes that appear in a label. Then we define dependency as follows.

$$\ell \# k \text{ iff } \{\ell, k\} \subseteq \mathbf{L}\_{\mathsf{l}\star \mathsf{?}} \text{ or } \mathsf{nodes}(\ell) \cap \mathsf{nodes}(k) \neq \emptyset$$

*Implementation.* We consider executions that satisfy the same four conditions above imposed on specifications. We refer the reader to the algorithm of Attiya et al. (2016) that provides timestamps for insertions that are monotone with respect to causality.

As an example, Attiya et al. (2016) allow the execution given on the left below. In this case, the dependent restriction is an intransitive lvo, even though the underlying execution is an lpo: in particular, !b does not precede -d in the dependent restriction. We give the order considered by dependent cuts on the right—this is a restriction of the dependent restriction: since we only consider pointed accessor cuts, we can safely ignore order out of non-mutators.

This execution is not linearizable, but it is sec, choosing witnesses to be subsequences of the mutator string "!b +d>b +c>b +a<b +e>d -b -d". Here, the document is initialized to b, then c and d are added after b, resulting in ?bcd. The order of c and d is determined by their timestamps. Afterwards, the top replica removes d and adds a; the bottom replica removes b and adds e, resulting in the final state ?ace. In the right execution, the removal of order out of the nonmutators shows the "update serializability" effect; the removal of order between -b and +e>d (and between -d and +a<b) shows the "preserved program order" effect.

*Correctness.* Given an execution, we can find a specification string s1s<sup>2</sup> that linearizes the mutators in the dependent restriction of the execution such that s<sup>1</sup> contains only adds and s<sup>2</sup> contains only removes. Such a specification string exists because by the conditions on executions, deletes do not have any outgoing edges to other mutators in the dependent restriction; so, they can be moved to the end in the matching specification string. In order to find s<sup>1</sup> that linearizes the add events, any linearization that respects causality and timestamps (yielded by the algorithm of Attiya et al. (2016)) suffices for our purposes. The conditions required by sec follow immediately.

### 5 Compositional Reasoning

The aim of this section is to establish compositional methods to reason about replicated data structures. We do so using *Labelled Transition Systems* (ltss), where the transitions are labelled by dependent cuts. We show how to derive an lts from an execution, lts(u). We also define an lts for the *most general* crdt that validates a specification, lts(Σ). We show that u is sec for Σ exactly when lts(u) is a refinement of lts(Σ). We use this alternative characterization to establish composition and abstraction results.

*LTSs.* An lts is a triple consisting of a set a states, an initial state and a labelled transition function between states. We first define the ltss for executions and specifications, then provide examples and discussion.

For both executions and specifications, the labels of the lts are dependent cuts: for executions, these are dependent cuts of the execution itself; for specifications, they are drawn from the set L# = <sup>v</sup>∈L cuts#(v) of all possible dependent cuts. We compare lts labels up to isomorphism, rather than identity. Thus it is safe to think of lts labels as (potentially intransitive) pomsets (Plotkin and Pratt 1997).

The states of the lts are different for the execution and specification. For executions, the states are cuts of the execution u itself, cuts(u); these are general cuts, not just dependent cuts. For specifications, the states are the stuttering equivalence classes of strings allowed by the specification, Σ/∼.

There is an isomorphism between strings and total orders. We make use of this in the definition, treating strings as totally-ordered lvos.

Define lts(u) = cuts(u), <sup>∅</sup>, −→i, where <sup>p</sup> <sup>v</sup> −→<sup>i</sup> <sup>q</sup> if <sup>v</sup> <sup>∈</sup> cuts#(q) and


Define lts(Σ) = Σ/∼, ε, −→s, where [σ] <sup>v</sup> −→<sup>s</sup> [ρ] if <sup>v</sup> ∈ L# and


We explain the definitions using examples from set, first for executions, then for specifications. Consider the execution on the left below. The derived lts is given on the right.

The states of the lts are cuts of the execution. The labels on transitions are *dependent* cuts. The requirements for execution transitions relate the source p, target q and label v. The leftmost requirements state that the target state must extend both the source and the label; thus the target state must be a combination of events and order from source and label. The middle requirements state that the maximal elements of the label must be new in the target; only the maximal elements of the label are added when moving from source to target. The upper right requirement states that the non-maximal order of the label must be respected by the source; thus the causal history reported by the label cannot contradict the causal history of the source. The lower right requirement ensures that maximal elements of the label are also maximal in the target. The restriction to dependent cuts explains the labels on transitions (-0-+0) +1 −→<sup>i</sup> (-0-+0); +1 and (-0-+0); (✓0-+1); ✓<sup>0</sup> (-0+0);✓<sup>0</sup> −−−−−−→<sup>i</sup> (-0-+0); (✓0-+1). By definition, there is a selftransition labelled with the empty lvo at every state; we elide these transitions in drawings.

The specification lts for set is infinite, of course. To illustrate, below we give two sub-ltss with limitations on mutators. On the left, we only allow +0 and +1. On the right, we only allow +0 and -0 and only consider the case in which there is at most one alternation between them. The states are shown using their canonical representatives. Because of the number of transitions, we show all dependent accessors as a single transition, with labels separated by commas.

The requirements for specification transitions are similar to those for implementations, but the states are equivalence classes over specification strings: with source [σ] and target [τ ]. There is a transition between the states if there are members of the equivalence classes, σ and τ , that satisfy the requirements. Since these are total orders, the leftmost requirements state that there must be linearizations of the source and label that are subsequences of the target. Similarly, the upper right requirement states that the non-maximal order of the label must be respected by the source; thus we have +0 +0-0 −−→<sup>s</sup> +0-0 but not +0 -0+0 −−→<sup>s</sup> σ, for any <sup>σ</sup>. The use of sub-order rather than subsequence allows +0-0 +0-0 −−→<sup>s</sup> +0-0-0 but prevents nonsense transitions such as +0-0 +0-0 −−→<sup>s</sup> -0+0-0. Because the states are total orders, we drop the implementation lts requirement that maximal events of the label must be maximal in the target. If we were to impose this restriction, we would disallow -0 +0 −→<sup>s</sup> +0-0.

It is worth noting that the specification of the add-wins set removes exactly three edges from the right lts: <sup>ε</sup> -0|+0 −−−→<sup>s</sup> +0-0, +0 -0 −→<sup>s</sup> +0-0, and -0 +0 −→<sup>s</sup> +0-0.

*Refinement.* Refinement is a functional form of simulation (Hoare 1972; Lamport 1983; Lynch and Vaandrager 1995). Let <sup>P</sup> <sup>=</sup> S<sup>P</sup> , p0, −→<sup>P</sup> and <sup>Q</sup> <sup>=</sup> SQ, q0, −→Q be ltss. A function <sup>f</sup> : <sup>S</sup><sup>P</sup> <sup>→</sup> <sup>S</sup><sup>Q</sup> is a *(strong) refinement* if <sup>p</sup> <sup>v</sup> −→<sup>P</sup> <sup>p</sup> and <sup>f</sup>(p) = <sup>q</sup> imply that there exist <sup>w</sup> <sup>=</sup>iso <sup>v</sup> and <sup>q</sup> <sup>∈</sup> <sup>S</sup><sup>Q</sup> such that <sup>q</sup> <sup>w</sup> −→<sup>Q</sup> <sup>q</sup> and f(p ) = q . Then <sup>P</sup> *refines* <sup>Q</sup> (notation <sup>P</sup> <sup>∼</sup> ❁ Q) if there exists a refinement <sup>f</sup> : <sup>S</sup><sup>P</sup> <sup>→</sup> <sup>S</sup><sup>Q</sup> such that the initial states are related, *i.e.*, <sup>f</sup>(p0) = <sup>q</sup>0.

We now prove that sec can be characterized as a refinement. We write <sup>p</sup><sup>0</sup> −→<sup>∗</sup> P p<sup>n</sup> when p<sup>n</sup> is reachable from p<sup>0</sup> via a finite sequence of steps p<sup>i</sup> <sup>u</sup>*<sup>i</sup>* −→<sup>P</sup> <sup>p</sup><sup>i</sup>+1.

Theorem 1. <sup>u</sup> *is* EC *for the specification* <sup>Σ</sup> *iff* lts(u) <sup>∼</sup> ❁ lts(Σ)*.*

*Proof.* For the forward direction, assume u is *EC* and therefore there exists a function <sup>τ</sup> : cuts#(u) <sup>→</sup> <sup>Σ</sup> such that <sup>∀</sup><sup>E</sup> <sup>∈</sup> cuts#(u). τ (E) is a linearization of <sup>E</sup>. For each cut <sup>p</sup> <sup>∈</sup> cuts(u), we start with the dependent restriction, <sup>p</sup>#. We further restriction attention to mutators, p#**M**. The required refinement maps p to the equivalence class of the linearization of p#**M**chosen by τ : f(p) - = [τ (p # **M**)]. We abuse notation below by identifying each equivalence class with a canonical element of the class.

We show that p <sup>v</sup> −→<sup>i</sup> <sup>q</sup> implies <sup>f</sup>(p) <sup>≤</sup>obs <sup>f</sup>(q). Since <sup>p</sup> <sup>⊆</sup> <sup>q</sup>, we deduce that <sup>p</sup> # **<sup>M</sup>** <sup>⊆</sup> <sup>q</sup> # **<sup>M</sup>** and by monotonicity, <sup>f</sup>(p) = <sup>τ</sup> (<sup>p</sup> # **<sup>M</sup>**) <sup>≤</sup>obs τ (q # **M**) = f(q).

We show that p <sup>v</sup> −→<sup>i</sup> <sup>q</sup> implies <sup>τ</sup> (v) <sup>≤</sup>obs <sup>f</sup>(q). Suppose <sup>v</sup> only contains mutators. Since <sup>v</sup> <sup>⊆</sup> <sup>q</sup>, we deduce that <sup>v</sup> <sup>⊆</sup> <sup>q</sup> # **<sup>M</sup>** and by monotonicity, <sup>τ</sup> (v) <sup>≤</sup>obs <sup>τ</sup> (<sup>q</sup> # **<sup>M</sup>**) = <sup>f</sup>(v). On the other hand, suppose <sup>v</sup> contains the nonmutator <sup>a</sup>. Let <sup>A</sup> <sup>=</sup> **<sup>M</sup>** ∪ {a}. Since <sup>v</sup> <sup>⊆</sup> <sup>q</sup>, we deduce that <sup>v</sup> **<sup>M</sup>** <sup>⊆</sup> <sup>q</sup> # <sup>A</sup>. By monotonicity, <sup>τ</sup> (<sup>v</sup> **<sup>M</sup>**) <sup>≤</sup>obs <sup>τ</sup> (<sup>q</sup> <sup>A</sup>). Since <sup>τ</sup> (<sup>q</sup> <sup>A</sup>) = <sup>τ</sup> (<sup>q</sup> **<sup>M</sup>**), we have <sup>τ</sup> (<sup>v</sup> **<sup>M</sup>**) <sup>≤</sup>obs <sup>τ</sup> (<sup>q</sup> **<sup>M</sup>**) = <sup>f</sup>(q), as required.

Thus f(p) <sup>v</sup> −→<sup>s</sup> <sup>f</sup>(q), completing this direction of the proof.

For the reverse direction, we are given a refinement <sup>f</sup> : cuts(u) <sup>→</sup> Σ/∼. For any <sup>p</sup> <sup>∈</sup> cuts#(u), define <sup>τ</sup> (p) to be a string in the equivalence class <sup>f</sup>(p) that includes any non-mutator found in p.

We first prove that τ (p) is a linearization of p. A simple inductive proof demonstrates that for any <sup>p</sup> <sup>∈</sup> cuts#(u), there is a transition sequence of the form ∅ −→<sup>∗</sup> i p −→<sup>i</sup> <sup>p</sup>. Thus, we deduce from the label on the final transition into <sup>p</sup> that the τ (p) related to p is a linearization of p.

We now establish monotonicity. A simple inductive proof shows that for any p, q <sup>∈</sup> cuts(u), <sup>p</sup> <sup>⊆</sup> <sup>q</sup> implies <sup>p</sup> −→<sup>∗</sup> <sup>i</sup> <sup>q</sup>. Thus <sup>τ</sup> (p) <sup>≤</sup>obs <sup>τ</sup> (q), by the properties of f and the definition of τ .

*Composition.* Given two *non-interacting* data structures whose replicated implementations satisfy their sequential specifications, the implementation that combines them satisfies the interleaving of their specifications. We formalize this as a composition theorem in the style of Herlihy and Wing (1990).

Given an execution <sup>u</sup> and <sup>L</sup> <sup>⊆</sup> **<sup>L</sup>**, write <sup>u</sup> <sup>L</sup> for the execution that results by restricting <sup>u</sup> to events with labels in <sup>L</sup>: <sup>u</sup> <sup>L</sup> <sup>=</sup> <sup>u</sup> {<sup>e</sup> <sup>∈</sup> <sup>E</sup><sup>u</sup> <sup>|</sup> <sup>λ</sup>u(e) <sup>∈</sup> <sup>L</sup>}. This notation lifts to sets in the standard way: U L = <sup>u</sup>∈<sup>U</sup> {<sup>u</sup> <sup>L</sup>}. Write <sup>u</sup> sec <sup>Σ</sup> to indicate that u is sec for Σ.

Theorem 2 (Composition). *Let* L<sup>1</sup> *and* L<sup>2</sup> *be mutually independent subsets of* **<sup>L</sup>***. For* <sup>i</sup> ∈ {1, <sup>2</sup>}*, let* <sup>Σ</sup><sup>i</sup> *be a specification with labels chosen from* <sup>L</sup>i*, such that* Σ<sup>1</sup> - Σ<sup>2</sup> *is also a specification. If* (U L1) sec Σ<sup>1</sup> *and* (U L2) sec Σ<sup>2</sup> *then* U sec (Σ<sup>1</sup> - Σ2) *(equivalently* lts(Σ<sup>1</sup> - Σ2) lts(Σ1) lts(Σ2)*).*

The proof is immediate. Since L<sup>1</sup> and L<sup>2</sup> are mutually independent, any interleaving of the labels will satisfy the definition.

*Abstraction.* We describe a process algebra with parallel composition and restriction and establish congruence results. We ignore syntactic details and work directly with ltss. Replica identities do not play a role in the definition; thus, we permit implicit mobility of the client amongst replicas with the only constraint being that the replica has at least as much history on the current item of interaction as the client. This constraint is enforced by the synchronization of the labels, defined below. While the definition includes the case where the client itself is replicated, it does not provide for out-of-band interaction between the clients at different replicas: All interaction is assumed to happen through the data structure.

The relation | is defined between ltss so that <sup>P</sup> | <sup>Q</sup> describes the system that results when client P interacts with data structure Q. For ltss P and Q, define −→<sup>×</sup> inductively, as follows, where <sup>∅</sup> represents the empty lvo.

$$\begin{array}{llll} q \stackrel{v}{\xleftarrow{v}}\_{Q} q'\\ \hline \langle p,q\rangle \stackrel{v}{\underset{-\infty}{\longleftarrow}}\_{\times} \langle p,q'\rangle \end{array} \quad \begin{array}{llll} p \stackrel{v}{\underset{-\infty}{\longleftarrow}}\_{P} p' & q \stackrel{w}{\underset{-\infty}{\longleftarrow}}\_{Q} q'\\ \hline \langle p,q\rangle \stackrel{\mathfrak{h}}{\longleftarrow}\_{\times} \langle p',q'\rangle \end{array} \exists v' = \mathsf{i\_0} \ v . \; v' \subseteq w \text{ and } \mathsf{max}(v') = \mathsf{max}(w) \ \langle p,q\rangle \stackrel{\mathfrak{h}}{\longleftarrow}\_{\times} \langle p,q'\rangle \end{array}$$

Let <sup>S</sup><sup>×</sup> <sup>=</sup> {p, q | ∃p , q . p, q −→<sup>∗</sup> <sup>×</sup> p , q and ∃v, p. p <sup>v</sup> −→<sup>P</sup> <sup>p</sup>}

$$P \parallel Q = \begin{cases} \{ \langle S\_{\times}, \langle p\_0, q\_0 \rangle, \longleftrightarrow\_{\times} \rangle \} & \text{if } S\_{\times} \text{ is non-empty} \\ \emptyset & \text{otherwise} \end{cases}$$

The | operator is asymmetric between the client and data structure in two ways. First, note that every action of the client must be matched by the data structure. The condition of client quiescence in the definition of <sup>S</sup>×, that all of the actions of the client <sup>P</sup> must be matched by <sup>Q</sup>; otherwise <sup>P</sup> | <sup>Q</sup> <sup>=</sup> <sup>∅</sup>. However, the first rule for −→<sup>×</sup> explicitly permits actions of the data structure that may not be matched by the client. This asymmetry permits the composition of the data structure with multiple clients to be described incrementally, one client at a time. Thus, we expect that (P<sup>1</sup> <sup>|</sup> <sup>P</sup>2) | <sup>Q</sup> -<sup>P</sup><sup>1</sup> | (P<sup>2</sup> | <sup>Q</sup>).

Second, note that right rule for −→<sup>×</sup> interaction permits the data structure <sup>Q</sup> to introduce order not found in the clients. This is clearly necessary to ensure that that the composition of client ✓0|+0 with the set data structure is nonempty. In this case, the client has no order between +0 and ✓0 whereas the data structure orders ✓0 after +0. In this paper, we do not permit the client to introduce order that is not seen in the data structure. For a discussion of this issue, see (Jagadeesan and Riely 2015).

We can also define restriction for some set <sup>A</sup> <sup>⊆</sup> **<sup>L</sup>** of labels, a lá CCS. <sup>P</sup>\<sup>A</sup> <sup>=</sup> S<sup>P</sup> , p0, {p, v, q|p, v, q ∈ ( −→<sup>P</sup> ) and labels(v) <sup>∩</sup> <sup>A</sup> <sup>=</sup> ∅}. The definitions lift to sets: P | Q = <sup>P</sup> ∈P, Q∈Q <sup>P</sup> | <sup>Q</sup> and P\<sup>A</sup> <sup>=</sup> {(P\A) <sup>|</sup> <sup>P</sup> ∈ P}.

Lemma 3. *If* <sup>P</sup> <sup>∼</sup> ❁ <sup>P</sup> *and* <sup>Q</sup> <sup>∼</sup> ❁ <sup>Q</sup> *then* P | Q <sup>∼</sup> ❁ <sup>P</sup> | Q *and* P\<sup>A</sup> <sup>∼</sup> ❁ <sup>P</sup> \A*.*

It suffices to show that: <sup>P</sup> <sup>∼</sup> ❁ lts(u) implies P | lts(u) <sup>∼</sup> ❁ P | lts(Σ). The proof proceeds in the traditional style of such proofs in process algebra. We illustrate by sketching the case for client parallel composition. Let f be the witness for P ∼ ❁ lts(u). The proof proceeds by constructing a "product" refinement <sup>S</sup> relation of the identity on the states of <sup>P</sup> with <sup>f</sup>, *i.e.*: <sup>f</sup>(q) = <sup>q</sup> implies p, qSp, q .

Thus, an sec implementation can be replaced by the specification.

Theorem 4 (Abstraction). *If* <sup>u</sup> *is* sec *for* <sup>Σ</sup>*, then* P | lts(u) <sup>∼</sup> ❁ P | lts(Σ)*.*

### 6 A Replicated Graph Algorithm

We describe a graph implemented with sets for vertices and edges, as specified by Shapiro et al. (2011a). The graph maintains the invariant that the vertices of an edge are also part of the graph. Thus, an edge may be added only if the corresponding vertices exist; conversely, a vertex may be removed only if it supports no edge. In the case of a concurrent addition of an edge with the deletion of either of its vertices, the deletion takes precedence.

The vertices v, w, . . . are drawn from some universe <sup>U</sup>. An edge e, e ,... is a pair of vertices. Let vert(e) = {v, w} be the vertices of edge <sup>e</sup> = (v, w). The vocabulary of the set specification includes mutators for the addition and removal of vertices and edges and non-mutators for membership tests.

$$\begin{aligned} \mathbf{M} &= \{ \forall v, \neg v, \mathsf{t} \{v, w\}, \neg \{v, w\} \mid v, w \in \mathsf{U} \} \\ \mathbf{\overline{M}} &= \{ \mathsf{\dot{\mathsf{\dot{\mathsf{\dot{\mathsf{\dot{\mathsf{\{\beta}}}}}}}}, \mathsf{\dot{\mathsf{\dot{\mathsf{\dot{\mathsf{\beta}}}}}}}, \mathsf{\dot{\mathsf{\dot{\mathsf{\beta}}}}}\} \mid v, w \in \mathsf{\mathcal{U}} \} \\ \boldsymbol{\#} &= \{ (e, v), (v, e) \mid v \in \mathsf{\text{\dot{\mathsf{\dot{\mathsf{\dot{\mathsf{\beta}}}}}}} \} \cup \{ (e, e') \mid \mathsf{\dot{\mathsf{\omega}}} \mathsf{\dot{\mathsf{\omega}}} \} \sqcap \mathsf{\text{\texttt{\dot{\mathsf{\omega}}}} (e') \neq \emptyset \} \end{aligned}$$

Valid graph specification strings answer queries like sets. In addition, we require the following.


*Graph Implementation.* We rewrite the graph program of Shapiro et al. (2011a) in a more abstract form. Our distributed graph implementation is written as a client of two replicate set: for vertices (V) and for edges (E). The implementation uses usets, which require that an element be added at most once and that each remove causally follow the corresponding add. Here we show the graph implementation for various methods as client code that runs at each replica. At each replica, the code accesses its local copy of the usets. All the message passing needed to propagate the updates is handled by the uset implementations of the sets V, E. For several methods, we list preconditions, which prescribe the natural assumptions that need to satisfied when these client methods are invoked. For example, an edge operation requires the presence of the vertices at the current replica.


We assume a causal transition system (as needed in Shapiro et al. (2011a)).

*Correctness Using the Set Specification.* We first show the correctness of the graph algorithm, using the set specification for the vertex and edge sets. We then apply the abstraction and composition theorems to show the correctness of the algorithm using a set implementation.

Let u be a lvo generated in an execution of the graph implementation. The preconditions ensure that u has the following properties:


Define σ1, σ<sup>2</sup> and σ<sup>3</sup> as follows.


Then u is sec with witness σ<sup>u</sup> = σ1σ2σ3.

*Full Correctness of the Implementation.* We now turn to proving the correctness of the algorithm when the two sets are replaced by their implementations.

Consider two (distributed implementations of) separate and independent sets for vertices and edges, *i.e.* **L**<sup>Σ</sup><sup>1</sup> ∩**L**<sup>Σ</sup><sup>2</sup> = ∅. Suppose we have two implementations, each of which is correct individually: lts(Ui) <sup>∼</sup> ❁ lts(Σi). By composition, we have that they are correct when composed together: U<sup>1</sup> - <sup>U</sup><sup>2</sup> <sup>∼</sup> ❁ Σ<sup>1</sup> - <sup>Σ</sup>2. Let <sup>P</sup> be the graph implementation, which is a client of the two sets. By abstraction, we know that P | (Σ<sup>1</sup> - <sup>Σ</sup>2) <sup>∼</sup> ❁ <sup>T</sup> implies P | (U<sup>1</sup> - <sup>U</sup>2) <sup>∼</sup> ❁ T. By congruence, we deduce:

$$\langle \mathcal{P} \parallel \{ \Sigma\_1 \parallel \Sigma\_2 \} \rangle \langle \mathbf{L}\_{\Sigma\_1} \cup \mathbf{L}\_{\Sigma\_2} \rangle \subseteq T \text{ implies } \langle \mathcal{P} \parallel \{ U\_1 \parallel U\_2 \} \rangle \langle \mathbf{L}\_{\Sigma\_1} \cup \mathbf{L}\_{\Sigma\_2} \rangle \subseteq T.$$

Thus, in order to validate the full graph implementation, it is sufficient to establish the correctness of the graph client when interacting with the *specification* of the two independent sets for edges and vertices, which we have already done in the previous treatment of abstract correctness.

### 7 Conclusions

We have provided a definition of *strong eventual consistency* that captures *validity* with respect to a *sequential specification*. Our definition reflects an attempt to resolve the tension between expressivity (cover the extant examples in the literature) and facilitating reasoning (by retaining a direct relationship with the sequential specification). The notion of *concurrent specification* developed by Burckhardt et al. (2014) has been used to prove the validity of several replicated data structure implementations. In future work, we would like to discover sufficient conditions relating concurrent and sequential specifications such that any implementation that is correct under the concurrent specification (as defined by Burckhardt et al. (2014)) will also be correct under the sequential counterpart (as defined here).

Acknowledgements. This paper has been greatly improved by the comments of the anonymous reviewers.

This material is based upon work supported by the National Science Foundation under Grant No. 1617175. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Compiler Verification

### **A Verified Compiler from Isabelle/HOL to CakeML**

Lars Hupel(B) and Tobias Nipkow

Technische Universit¨at M¨unchen, Munich, Germany lars.hupel@tum.de, nipkow@in.tum.de

**Abstract.** Many theorem provers can generate functional programs from definitions or proofs. However, this code generation needs to be trusted. Except for the HOL4 system, which has a proof producing code generator for a subset of ML. We go one step further and provide a verified compiler from Isabelle/HOL to CakeML. More precisely we combine a simple proof producing translation of recursion equations in Isabelle/HOL into a deeply embedded term language with a fully verified compilation chain to the target language CakeML.

**Keywords:** Isabelle · CakeML · Compiler Higher-order term rewriting

### **1 Introduction**

Many theorem provers have the ability to generate executable code in some (typically functional) programming language from definitions, lemmas and proofs (e.g. [6,8,9,12,16,27,37]). This makes code generation part of the trusted kernel of the system. Myreen and Owens [30] closed this gap for the HOL4 system: they have implemented a tool that translates from HOL4 into *CakeML*, a subset of SML, and proves a theorem stating that a result produced by the CakeML code is correct w.r.t. the HOL functions. They also have a verified implementation of CakeML [24,40]. We go one step further and provide a once-and-for-all verified compiler from (deeply embedded) function definitions in Isabelle/HOL [32,33] into CakeML proving partial correctness of the generated CakeML code w.r.t. the original functions. This is like the step from dynamic to static type checking. It also means that preconditions on the input to the compiler are explicitly given in the correctness theorem rather than implicitly by a failing translation. To the best of our knowledge this is the first verified (as opposed to certifying) compiler from function definitions in a logic into a programming language.

Our compiler is composed of multiple phases and in principle applicable to other languages than Isabelle/HOL or even HOL:


The compiler operates in three stages:


The first two stages are preprocessing, are implemented in ML and produce certificate theorems. Only these stages are specific to Isabelle. The third (and main) stage is implemented completely in the logic HOL, without recourse to ML. Its correctness is verified once and for all.<sup>1</sup>

### **2 Related Work**

There is existing work in the Coq [2,15] and HOL [30] communities for proof producing or verified extraction of functions defined in the logic. Anand *et al.* [2] present work in progress on a verified compiler from Gallina (Coq's specification language) via untyped intermediate languages to CompCert C light. They plan to connect their extraction routine to the CompCert compiler [26].

Translation of type classes into dictionaries is an important feature of Haskell compilers. In the setting of Isabelle/HOL, this has been described by Wenzel [44] and Krauss *et al.* [23]. Haftmann and Nipkow [17] use this construction to compile HOL definitions into target languages that do not support type classes, e.g. Standard ML and OCaml. In this work, we provide a certifying translation that eliminates type classes inside the logic.

Compilation of pattern matching is well understood in literature [3,36,38]. In this work, we contribute a transformation of sets of equations with pattern matching on the left-hand side into a single equation with nested pattern matching on the right-hand side. This is implemented and verified inside Isabelle.

Besides CakeML, there are many projects for verified compilers for functional programming languages of various degrees of sophistication and realism (e.g.

<sup>1</sup> All Isabelle definitions and proofs can be found on the paper website: https:// lars.hupel.info/research/codegen/, or archived as https://doi.org/10.5281/zenodo. 1167616.

[4,11,14]). Particularly modular is the work by Neis *et al.* [31] on a verified compiler for an ML-like imperative source language. The main distinguishing feature of our work is that we start from a set of higher-order recursion equations with pattern matching on the left-hand side rather than a lambda calculus with pattern matching on the right-hand side. On the other hand we stand on the shoulders of CakeML which allows us to bypass all complications of machine code generation. Note that much of our compiler is not specific to CakeML and that it would be possible to retarget it to, for example, Pilsner abstract syntax with moderate effort.

Finally, Fallenstein and Kumar [13] have presented a model of HOL inside HOL using large cardinals, including a reflection proof principle.

### **3 Deep Embedding**

Starting with a HOL definition, we derive a new, *reified* definition in a deeply embedded term language depicted in Fig. 1a. This term language corresponds closely to the term datatype of Isabelle's implementation (using de Bruijn indices [10]), but without types and schematic variables.

To establish a formal connection between the original and the reified definitions, we use a *logical relation*, a concept that is well-understood in literature [20] and can be nicely implemented in Isabelle using type classes. Note that the use of type classes here is restricted to correctness proofs; it is not required for the execution of the compiler itself. That way, there is no contradiction to the elimination of type classes occurring in a previous stage.

*Notation.* We abbreviate App t u to *<sup>t</sup>* \$ *<sup>u</sup>* and Abs t to Λ t. Other term types introduced later in this paper use the same conventions. We reserve λ for abstractions in HOL itself. Typing judgments are written with a double colon: t:: τ .

*Embedding Operation.* Embedding is implemented in ML. We denote this operation using angle brackets: t, where t is an arbitrary HOL expression and the result t is a HOL value of type term. It is a purely syntactic transformation, without preliminary evaluation or reduction, and it discards type information. The following examples illustrate this operation and typographical conventions concerning variables and constants:

x <sup>=</sup> Free "x" <sup>f</sup> <sup>=</sup> Const "f" λx. <sup>f</sup> x <sup>=</sup> Λ (f \$ Bound 0)

*Small-Step Semantics.* Figure 1b specifies the small-step semantics for term. It is reminiscent of *higher-order term rewriting*, and modelled closely after equality in HOL. The basic idea is that if the proposition t <sup>=</sup> u can be proved equationally in HOL (without symmetry), then R t −→<sup>∗</sup> u holds (where *<sup>R</sup>* :: (term <sup>×</sup> term) set). We call *R* the *rule set.* It is the result of translating a set of defining equations *lhs* = *rhs* into pairs (*lhs*,*rhs*) ∈ *R*.

**Fig. 1.** Basic syntax and semantics of the term type

Rule Step performs a rewrite step by picking a rewrite rule from R and rewriting the term at the root. For that purpose, match and subst are (mostly) standard first-order matching and substitution (see Sect. 4 for details).

Rule Beta performs β-reduction. Type term represents bound variables by de Bruijn indices. The notation t[t ] represents the substitution of the outermost bound variable in t with t .

Our semantics does not constitute a fully-general higher-order term rewriting system, because we do not allow substitution under binders. For de Bruijn terms, this would pose no problem, but as soon as we introduce named bound variables, substitution under binders requires dealing with capture. To avoid this altogether, all our semantics expect terms that are substituted into abstractions to be closed. However, this does not mean that we restrict ourselves to any particular evaluation order. Both call-by-value and call-by-name can be used in the small-step semantics. But later on, the target semantics will only use call-byvalue.

*Embedding Relation.* We denote the concept that an embedded term t corresponds to a HOL term a of type τ w.r.t. rule set *<sup>R</sup>* with the syntax *<sup>R</sup>* t <sup>≈</sup> a. If we want to be explicit about the type, we index the relation: ≈<sup>τ</sup> .

For ground types, this can be defined easily. For example, the following two rules define ≈nat:

$$\begin{array}{c} \begin{array}{c} R \vdash \langle 0 \rangle \approx\_{\texttt{nat}} 0 \end{array} \qquad \begin{array}{c} R \vdash \langle t \rangle \approx\_{\texttt{nat}} n \\ R \vdash \langle \texttt{Suc} \, t \rangle \approx\_{\texttt{nat}} \texttt{Suc} \, n \end{array} \end{array}$$

Definitions of ≈ for arbitrary datatypes without nested recursion can be derived mechanically in the same fashion as for nat, where they constitute one-toone relations. Note that for ground types, ≈ ignores *R*. The reason why ≈ is parametrized on *R* will become clear in a moment.

For function types, we follow Myreen and Owen's approach [30]. The statement *<sup>R</sup>* t <sup>≈</sup> f can be interpreted as "t \$ a can be rewritten to f a for all a". Because this might involve applying a function definition from *<sup>R</sup>*, the <sup>≈</sup> relation must be indexed by the rule set. As a notational convenience, we define another relation *<sup>R</sup>* t <sup>↓</sup> x to mean that there is a t such that R t −→<sup>∗</sup> t and *<sup>R</sup>* t <sup>≈</sup> x. Using this notation, we formally define <sup>≈</sup> for functions as follows:

$$R \vdash t \approx f \leftrightarrow (\forall u \; x. \; R \vdash u \; \downarrow \; x \to R \vdash t \; \\$\; u \downarrow f \; x)$$

*Example.* As a running example, we will use the map function on lists:

$$\begin{array}{l} \mathsf{map}\ f\ \left[\right] \ \ = \ \right] \\ \mathsf{map}\ f\ (x \ \#\, xs) = \ f\ x \ \#\, \mathsf{map}\ f\ xs \ \end{array}$$

The result of embedding this function is a set of rules map :

map' =

```
{(Const "List.list.map" $ Free "f" $ (Const "List.list.Cons" $ Free "x21" $ Free "x22"),
   Const "List.list.Cons" $ (Free "f" $ Free "x21") $ ...),
(Const "List.list.map" $ Free "f" $ Const "List.list.Nil",
   Const "List.list.Nil")}
```
together with the theorem map Const "List.list.map" <sup>↓</sup> map, which is proven by simple induction over map. Constant names like "List.list.map" come from the fully-qualified internal names in HOL.

The induction principle for the proof arises from the use of the **fun** command that is used to define recursive functions in HOL [22]. But the user is also allowed to specify custom equations for functions, in which case we will use heuristics to generate and prove the appropriate induction theorem. For simplicity, we will use the term *(defining) equation* uniformly to refer to any set of equations, either default ones or ones specified by the user. Embedding partially-specified functions – in particular, proving the certificate theorem about them – is currently not supported. In the future, we plan to leverage the domain predicate as produced by **fun** to generate conditional theorems.

### **4 Terms, Matching and Substitution**

The compiler transforms the initial term type (Fig. 1a) through various intermediate stages. This section gives an overview and introduces necessary terminology.

*Preliminaries.* The function arrow in HOL is ⇒. The cons operator on lists is the infix #.

Throughout the paper, the concept of *mappings* is pervasive: We use the type notation αβ to denote a function α <sup>⇒</sup> β option. In certain contexts, a mapping may also be called an *environment*. We write mapping literals using brackets: [a <sup>⇒</sup> x, b <sup>⇒</sup> y,...]. If it is clear from the context that σ is defined on a, we often treat the lookup σ a as returning an x :: β.

The functions dom :: (αβ) <sup>⇒</sup> α set and range :: (αβ) <sup>⇒</sup> β set return the *domain* and *range* of a mapping, respectively.

Dropping entries from a mapping is denoted by σ <sup>−</sup> k, where σ is a mapping and k is either a single key or a set of keys. We use σ <sup>⊆</sup> <sup>σ</sup> to denote that <sup>σ</sup> is a sub-mapping of σ, that is, dom σ <sup>⊆</sup> dom <sup>σ</sup> and <sup>∀</sup><sup>a</sup> <sup>∈</sup> dom <sup>σ</sup> . σ a <sup>=</sup> σ a.

Merging two mappings σ and ρ is denoted with σ ++ ρ. It constructs a new mapping with the union domain of σ and ρ. Entries from ρ override entries from σ. That is, ρ <sup>⊆</sup> σ ++ ρ holds, but not necessarily σ <sup>⊆</sup> σ ++ ρ.

All mappings and sets are assumed to be finite. In the formalization, this is enforced by using subtypes of and set. Note that one cannot define datatypes by recursion through sets for cardinality reasons. However, for finite sets, it is possible. This is required to construct the various term types. We leverage facilities of Blanchette *et al.*'s **datatype** command to define these subtypes [7].

*Standard Functions.* All type constructors that we use (, set, list, option, ...) support the standard operations map and rel. For lists, map is the regular covariant map. For mappings, the function has the type (β <sup>⇒</sup> γ) <sup>⇒</sup> (αβ) <sup>⇒</sup> (αγ). It leaves the domain unchanged, but applies a function to the range of the mapping.

Function rel<sup>τ</sup> lifts a binary predicate <sup>P</sup> :: <sup>α</sup> <sup>⇒</sup> <sup>α</sup> <sup>⇒</sup> bool to the type constructor τ . We call this lifted relation the *relator* for a particular type.

For datatypes, its definition is structural, for example:

$$\begin{array}{ccc}\hline \text{rel\\_let\\_} & \begin{array}{c} \text{rel\\_list\\_xs\ ys} \\ \hline \text{rel\\_list\\_P\ (\$x \neq xs\$)} \ (y \neq ys\$) \\ \hline \end{array} \end{array}$$

For sets and mappings, the definition is a little bit more subtle.

**Definition 1 (Set relator).** *For each element* a <sup>∈</sup> A*, there must be a corresponding element* b <sup>∈</sup> B *such that* P ab*, and vice versa. Formally:*

$$\mathsf{rel}\_{\mathsf{set}} \, P \, A \, B \leftrightarrow (\forall x \in A. \, \exists y \in B. \, P \, x \, y) \land (\forall y \in B. \, \exists x \in A. \, P \, x \, y)$$

**Definition 2 (Mapping relator).** *For each* a*,* m a *and* n a *must be related according to* reloption <sup>P</sup>*. Formally:*

$$\mathsf{rel}\_{\mathsf{mapping}} P \; m \; n \leftrightarrow \left( \forall a . \; \mathsf{rel}\_{\mathsf{option}} \; P \; (m \; a) \; (n \; a) \right)$$

*Term Types.* There are four distinct term types: term, nterm, pterm, and sterm. All of them support the notions of free variables, matching and substitution. Free variables are always a finite set of strings. Matching a term against a *pattern* yields an optional mapping of type string α from free variable names to terms.

Note that the type of patterns is itself term instead of a dedicated pattern type. The reason is that we have to subject patterns to a linearity constraint anyway and may use this constraint to carve out the relevant subset of terms:

**Definition 3.** *A term is* linear *if there is at most one occurrence of any variable, it contains no abstractions, and in an application* f \$ x*,* f *must not be a free variable. The HOL predicate is called* linear ::term ⇒ bool*.*

Because of the similarity of operations across the term types, they are all instances of the term type class. Note that in Isabelle, classes and types live in different namespaces. The term type and the term type class are separate entities.

**Definition 4.** *<sup>A</sup>* term type τ *supports the operations* match ::term <sup>⇒</sup> τ <sup>⇒</sup> (string τ )*,* subst:: (string τ ) <sup>⇒</sup> τ <sup>⇒</sup> τ *and* frees:: τ <sup>⇒</sup> string set*. We also define the following derived functions:*


Additionally, some (obvious) axioms have to be satisfied. We do not strive to fully specify an abstract term algebra. Instead, the axioms are chosen according to the needs of this formalization.

A notable deviation from matching as discussed in term rewriting literature is that the result of matching is only well-defined if the pattern is linear.

**Definition 5.** *An* equation *is a pair of a pattern* (left-hand side) *and a term* (right-hand side)*. The pattern is of the form* <sup>f</sup> \$p<sup>1</sup> \$...\$p<sup>n</sup>*, where* <sup>f</sup> *is a constant (i.e. of the form* Const *name). We refer to both* f *or name interchangeably as the* function symbol *of the equation.*

Following term rewriting terminology, we sometimes refer to an equation as *rule*.

### **4.1 De Bruijn terms (term)**

The definition of term is almost an exact copy of Isabelle's internal term type, with the notable omissions of type information and schematic variables (Fig. 1a). The implementation of β-reduction is straightforward via index shifting of bound variables.

### **4.2 Named Bound Variables (nterm)**

**datatype** nterm = Nconst string | Nvar string | Nabs string nterm | Napp nterm nterm

The nterm type is similar to term, but removes the distinction between *bound* and *free* variables. Instead, there are only named variables. As mentioned in the previous section, we forbid substitution of terms that are not closed in order to avoid capture. This is also reflected in the syntactic side conditions of the correctness proofs (Sect. 5.1).

### **4.3 Explicit Pattern Matching (pterm)**

#### **datatype** pterm =

Pconst string | Pvar string | Pabs ((term × pterm) set) | Papp pterm pterm

Functions in HOL are usually defined using *implicit* pattern matching, that is, the terms <sup>p</sup><sup>i</sup> occurring on the left-hand side <sup>f</sup> <sup>p</sup><sup>1</sup> ... pn of an equation must be constructor patterns. This is also common among functional programming languages like Haskell or OCaml. CakeML only supports *explicit* pattern matching using case expressions. A function definition consisting of multiple defining equations must hence be translated to the form f <sup>=</sup> λx. **case** x **of** .... The elimination proceeds by iteratively removing the last parameter in the block of equations until none are left.

In our formalization, we opted to combine the notion of abstraction and case expression, yielding *case abstractions*, represented as the Pabs constructor. This is similar to the fn construct in Standard ML, which denotes an anonymous function that immediately matches on its argument [28]. The same construct also exists in Haskell with the LambdaCase language extension. We chose this representation mainly for two reasons: First, it allows for a simpler language grammar because there is only one (shared) constructor for abstraction and case expression. Second, the elimination procedure outlined above does not have to introduce fresh names in the process. Later, when translating to CakeML syntax, fresh names are introduced and proved correct in a separate step.

The set of pairs of pattern and right-hand side inside a case abstraction is referred to as *clauses.* As a short-hand notation, we use <sup>Λ</sup>{p<sup>1</sup> <sup>⇒</sup> <sup>t</sup>1, p<sup>2</sup> <sup>⇒</sup> <sup>t</sup>2,...}.

#### **4.4 Sequential Clauses (sterm)**

#### **datatype** sterm =

Sconst string | Svar string | Sabs ((term × sterm) list) | Sapp sterm sterm

In the term rewriting fragment of HOL, the order of rules is not significant. If a rule matches, it can be applied, regardless when it was defined or proven. This is reflected by the use of sets in the rule and term types. For CakeML, the rules need to be applied in a deterministic order, i.e. sequentially. The sterm type only differs from pterm by using list instead of set. Hence, case abstractions use list brackets: <sup>Λ</sup>[p<sup>1</sup> <sup>⇒</sup> <sup>t</sup><sup>1</sup>, p<sup>2</sup> <sup>⇒</sup> <sup>t</sup><sup>2</sup>,...].

#### **4.5 Irreducible Terms (value)**

CakeML distinguishes between *expressions* and *values*. Whereas expressions may contain free variables or β-redexes, values are closed and fully evaluated. Both have a notion of abstraction, but values differ from expressions in that they contain an environment binding free variables.

Consider the expression (λx.λy.x) (λz.z), which is rewritten (by β-reduction) to λy.λz.z. Note how the bound variable x disappears, since it is replaced. This is contrary to how programming languages are usually implemented: evaluation does not happen by substituting the argument term t for the bound variable x, but by recording the binding x → t in an environment [24]. A pair of an abstraction and an environment is usually called a *closure* [25,41].

In CakeML, this means that evaluation of the above expression results in the closure

(λy.x, ["x" → (λz.z, [])])

Note the nested structure of the closure, whose environment itself contains a closure.

To reflect this in our formalization, we introduce a type value of values (explanation inline):

#### **datatype** value =

*(*∗ *constructor value: a data constructor applied to multiple values* ∗*)* Vconstr string (value list) | *(*∗ *closure: clauses combined with an environment mapping variables to values* ∗*)* Vabs ((term × sterm) list) (string value) | *(*∗ *recursive closures: a group of mutually recursive function bodies with an environment* ∗*)*

Vrecabs (string - ((term × sterm) list)) string (string value)

The above example evaluates to the closure:

$$\mathsf{Vals}\left[\left\{y\right\}\Rightarrow\left\langle x\right\rangle\right]\left[\mathsf{"x"}\mapsto\mathsf{Vals}\left[\left\langle z\right\rangle\Rightarrow\left\langle z\right\rangle\right]\right]\right]$$

The third case for recursive closures only becomes relevant when we conflate variables and constants. As long as the rule set *rs* is kept separate, recursive calls are straightforward: the appropriate definition for the constant can be looked up there. CakeML knows no such distinction between constants and variables, hence everything has to reside in a single environment σ.

Consider this example of odd and even:

$$\begin{array}{cc} \mathsf{odd}\ 0 = \mathsf{False} & \mathsf{even}\ 0 = \mathsf{True} \\ \mathsf{odd}\ (\mathsf{Suc}\ n) = \mathsf{even}\ n & \mathsf{even}\ (\mathsf{Suc}\ n) = \mathsf{odd}\ n \end{array}$$

When evaluating the term odd k, the definitions of even and odd themselves must be available in the environment captured in the definition of odd. However, it would be cumbersome in HOL to construct such a Vabs that refers to itself. Instead, we capture the expressions used to define odd and even in a recursive closure. Other encodings might be possible, but since we are targeting CakeML, we are opting to model it in a similar way as its authors do.

For the above example, this would result in the following global environment:

$$["\mathsf{odd}" \mapsto \mathsf{Vrecabs } \varsigma \iota s \mathsf{t} \mathsf{odd}" [], \mathsf{"{even}" \mapsto \mathsf{Vrecabs } \varsigma \iota s \mathsf{t} \mathsf{even}" []]]$$

$$\text{where } \mathsf{c}\varsigma s = [\mathsf{"{odd}"} \mapsto [\langle 0 \rangle \Rightarrow \langle \mathsf{False} \rangle, \langle \mathsf{Suc} \ n \rangle \Rightarrow \langle \mathsf{even} \ n \rangle],$$

$$\mathsf{"{even}"} \mapsto [\langle 0 \rangle \Rightarrow \langle \mathsf{Ture} \rangle, \langle \mathsf{Suc} \ n \rangle \Rightarrow \langle \mathsf{odd} \ n \rangle]]$$

Note that in the first line, the right-hand sides are values, but in *css*, they are expressions. The additional string argument of Vrecabs denotes the selected function. When evaluating an application of a recursive closure to an argument (β-reduction), the semantics adds all constituent functions of the closure to the environment used for recursive evaluation.

### **5 Intermediate Semantics and Compiler Phases**

In this section, we will discuss the progression from de Bruijn based term language with its small-step semantics given in Fig. 1a to the final CakeML semantics. The compiler starts out with terms of type term and applies multiple phases to eliminate features that are not present in the CakeML source language.

**Fig. 2.** Intermediate semantics and compiler phases

Types term, nterm and pterm each have a small-step semantics only. Type sterm has a small-step and several intermediate big-step semantics that bridge the gap to CakeML. An overview of the intermediate semantics and compiler phases is depicted in Fig. 2. The left-hand column gives an overview of the different phases. The right-hand column gives the types of the rule set and the semantics for each phase; you may want to skip it upon first reading.

Step (*lhs*, *rhs*) <sup>∈</sup> *<sup>R</sup>* match *lhs* <sup>t</sup> <sup>=</sup> Some <sup>σ</sup> *R* t −→ subst σ *rhs* Beta closed t - *R* (Λx. t) \$ t - −→ subst - x → t - t

**Fig. 3.** Small-step semantics for nterm with named bound variables

### **5.1 Side Conditions**

All of the following semantics require some side conditions on the rule set. These conditions are purely syntactic. As an example we list the conditions for the correctness of the first compiler phase:


The conditions for the subsequent phases are sufficiently similar that we do not list them again.

In the formalization, we use named contexts to fix the rules and assumptions on them (*locales* in Isabelle terminology). Each phase has its own locale, together with a proof that after compilation, the preconditions of the next phase are satisfied. Correctness proofs assume the above conditions on R and similar conditions on the term that is reduced. For brevity, this is usually omitted in our presentation.

### **5.2 Naming Bound Variables: From term to nterm**

Isabelle uses de Bruijn indices in the term language for the following two reasons: For substitution, there is no need to rename bound variables. Additionally, α-equivalent terms are equal. In implementations of programming languages, these advantages are not required: Typically, substitutions do not happen inside abstractions, and there is no notion of equality of functions. Therefore CakeML uses named variables and in this compilation step, we get rid of de Bruijn indices.

The "named" semantics is based on the nterm type. The rules that are changed from the original semantics (Fig. 1b) are given in Fig. 3 (Fun and Arg remain unchanged). Notably, β-reduction reuses the substitution function.

For the correctness proof, we need to establish a correspondence between terms and nterms. Translation from nterm to term is trivial: Replace bound variables by the number of abstractions between occurrence and where they were bound in, and keep free variables as they are. This function is called nterm to term.

The other direction is not unique and requires introduction of *fresh* names for bound variables. In our formalization, we have chosen to use a *monad* to produce these names. This function is called term to nterm. We can also prove the obvious property nterm to term (term to nterm <sup>t</sup>) = <sup>t</sup>, where <sup>t</sup> is a term without dangling de Bruijn indices.

Generation of fresh names in general can be thought of as picking a string that is not an element of a (finite) set of already existing names. For Isabelle, the *Nominal* framework [42,43] provides support for reasoning over fresh names, but unfortunately, its definitions are not executable.

Instead, we chose to model generation of fresh names as a monad α fresh with the following primitive operations in addition to the monad operations:

> run:: α fresh <sup>⇒</sup> string set <sup>⇒</sup> α fresh name:: string fresh

In our implementation, we have chosen to represent α fresh as roughly isomorphic to the state monad.

Compilation of a rule set proceeds by translation of the right-hand side of all rules:

compile *<sup>R</sup>* <sup>=</sup> {(p,term to nterm <sup>t</sup>) <sup>|</sup> (p, t) <sup>∈</sup> *<sup>R</sup>*}

The left-hand side is left unchanged for two reasons: function match expects an argument of type term (see Sect. 4), and patterns do not contain abstractions or bound variables.

**Theorem 1 (Correctness of compilation).** *Assuming a step can be taken with the compiled rule set, it can be reproduced with the original rule set.*

$$\begin{array}{cc} \text{\textbf{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\cdots}}}}}}}}}}}}}}}}}}}}}}}} \texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\top}}}}}}}}}}}}}}}}}}\end\texttt{\texttt{\texttt{\texttt{\cdot}}}}} }} } } } } } $$

We prove this by induction over the semantics (Fig. 3).

$$\text{BETA } \frac{(pat, rhs) \in C \qquad \text{match } pat \ t = \text{Some } \sigma \qquad \text{closed } t)}{R \vdash (A \ C) \\$ \ t \longrightarrow \text{subst } \sigma \ rhs}$$

$$\text{STEP}^{\circ} \frac{(name, rhs) \in R}{R \vdash \text{Pconst } name \longrightarrow rhs}$$

**Fig. 4.** Small-step semantics for pterm with pattern matching

#### **5.3 Explicit Pattern Matching: From nterm to pterm**

Usually, functions in HOL are defined using *implicit* pattern matching, that is, the left-hand side of an equation is of the form <sup>f</sup> <sup>p</sup><sup>1</sup> ... p<sup>n</sup>, where the <sup>p</sup><sup>i</sup> are patterns over datatype constructors. For any given function f, there may be multiple such equations. In this compilation step, we transform sets of equations for f defined using implicit pattern matching into a single equation for f of the form <sup>f</sup> <sup>=</sup> Λ *<sup>C</sup>*, where *<sup>C</sup>* is a set of clauses.

The strategy we employ currently requires successive elimination of a single parameter from right to left, in a similar fashion as Slind's pattern matching compiler [38, Sect. 3.3.1]. Recall our running example (map). It has arity 2. We omit the brackets -for brevity. First, the list parameter gets eliminated:

$$\begin{aligned} \mathsf{map}\ f = \lambda \ [] &\Rightarrow [] \\ \mid x \# xs \Rightarrow f\ x \# \mathsf{map}\ f\ xs \end{aligned} $$

Finally, the function parameter gets eliminated:

$$\begin{aligned} \mathsf{map} = \lambda \ f \Rightarrow \begin{pmatrix} \lambda \ \lbrack \Rightarrow \ \lbrack \end{pmatrix} \end{aligned} $$
 
$$\begin{aligned} \mid \ x \ \#\; xs \Rightarrow f \ x \ \#\; \mathsf{map} \; f \; xs \end{aligned} $$

This has now arity 0 and is defined by a twice-nested abstraction.

**Semantics.** The target semantics is given in Fig. 4 (the Fun and Arg rules from previous semantics remain unchanged). We start out with a rule set *R* that allows only implicit pattern matching. After elimination, only explicit pattern matching remains. The modified Step rule merely replaces a constant by its definition, without taking arguments into account.

**Restrictions.** For the transformation to work, we need a strong assumption about the structure of the patterns <sup>p</sup><sup>i</sup> to avoid the following situation:

$$\begin{array}{l} \mathsf{map}\ f\ \left[\right] &=& \left[\right] \\ \mathsf{map}\ g\ (x \not\equiv xs) = g\ x \not\equiv \mathsf{map}\ g\ xs \end{array}$$

Through elimination, this would turn into:

$$\begin{aligned} \mathsf{map} = \lambda \ f &\Rightarrow \begin{pmatrix} \lambda \ \lbrack \Rightarrow \lbrack \rbrack \end{pmatrix} \\\ \mid g &\Rightarrow \begin{pmatrix} \lambda \ x \ \#\, xs \Rightarrow f \ x \ \#\, \mathsf{map} \ f \ xs \end{pmatrix} \end{aligned}$$

$$\text{Strexp}\begin{array}{c} (name, rhs) \in R\\ R \vdash \mathsf{Sconst} \;name \longrightarrow rhs \end{array} \text{BETA} \\ \begin{array}{c} \text{first.} \mathsf{match} \; cs \; t = \mathsf{Some} \; (\sigma, rhs) \qquad \text{closed } t \; \text{title} \; \mathsf{st} \; \text{else} \\\ \mathsf{R} \vdash (A \; cs) \; \text{Set} \; \text{---} \; \mathsf{subst} \; \sigma \; rhs \end{array} \text{)}$$

### **Fig. 5.** Small-step semantics for sterm

Even though the original equations were non-overlapping, we suddenly obtained an abstraction with two overlapping patterns. Slind observed a similar problem [38, Sect. 3.3.2] in his algorithm. Therefore, he only permits *uniform* equations, as defined by Wadler [36, Sect. 5.5]. Here, we can give a formal characterization of our requirements as a computable function on pairs of patterns:

**fun** pat compat :: term ⇒ term ⇒ bool **where** pat compat (*t*<sup>1</sup> \$ *t*2) (*u*<sup>1</sup> \$ *u*2) ↔ pat compat *t*<sup>1</sup> *u*<sup>1</sup> ∧ (*t*<sup>1</sup> = *u*<sup>1</sup> → pat compat *t*<sup>2</sup> *u*2) pat compat *t u* ↔ (overlapping *t u* → *t* = *u*)

This compatibility constraint ensures that any two overlapping patterns (of the same column) <sup>p</sup>i,k and <sup>p</sup>j,k are equal and are thus appropriately grouped together in the elimination procedure. We require all defining equations of a constant to be mutually compatible. Equations violating this constraint will be flagged during embedding (Sect. 3), whereas the pattern elimination algorithm always succeeds.

While this rules out some theoretically possible pattern combinations (e.g. the *diagonal* function [36, Sect. 5.5]), in practice, we have not found this to be a problem: All of the function definitions we have tried (Sect. 8) satisfied pattern compatibility (after automatic renaming of pattern variables). As a last resort, the user can manually instantiate function equations. Although this will always lead to a pattern compatible definition, it is not done automatically, due to the potential blow-up.

**Discussion.** Because this compilation phase is both non-trivial and has some minor restrictions on the set of function definitions that can be processed, we may provide an alternative implementation in the future. Instead of eliminating patterns from right to left, patterns may be grouped in tuples. The above example would be translated into:

$$\begin{aligned} \mathsf{map} = \lambda \,(f, []) &\Rightarrow [] \\ | \,(f, x \,\#\, xs) &\Rightarrow f \, x \,\#\, \mathsf{map} \, f \, xs \end{aligned} $$

We would then leave the compilation of patterns for the CakeML compiler, which has no pattern compatibility restriction.

The obvious disadvantage however is that this would require the knowledge of a tuple type in the term language which is otherwise unaware of concrete datatypes.

#### **5.4 Sequentialization: From pterm to sterm**

The semantics of pterm and sterm differ only in rule Step and Beta. Figure 5 shows the modified rules. Instead of any matching clause, the first matching clause in a case abstraction is picked.

For the correctness proof, the order of clauses does not matter: we only need to prove that a step taken in the sequential semantics can be reproduced in the unordered semantics. As long as no rules are dropped, this is trivially true. For that reason, the compiler orders the clauses lexicographically. At the same time the rules are also converted from type (string × pterm) set to (string × sterm) list. Below, *rs* will always denote a list of the latter type.

$$\begin{aligned} \text{Consstr } \frac{(name, rhs) \in rs}{rs, \sigma \vdash \mathsf{Sconst} \;name \; \vdash rhs} \qquad \text{VAR } \frac{\sigma \;name = \mathsf{Same} \; \upsilon}{rs, \sigma \vdash \mathsf{Svar} \;name \; \vdash v} \\ \text{Ass} \frac{(rs, \sigma \vdash \Lambda \; cs \; \vdash \Lambda \; [(pat, \texttt{subst} \; (\sigma - \mathsf{free} \; past) \; t \; \mid (pat, t) \leftarrow cs]]}{rs, \sigma \vdash t \; \bot \; cs} \\ \text{COMP } \frac{rs, \sigma \vdash u \; \vdash u' \; \qquad \text{first} \; \mathsf{match} \; cs \; u' = \mathsf{Some} \; (\sigma', rhs) \qquad rs, \sigma \mathrel{\mapsto} \;\sigma' \vdash rhs \mid v}{rs, \sigma \vdash t \; \mathbb{S} \; u \; \vdash v} \\ \text{Consstr } \frac{name \in constraints \; rs \; \sigma \vdash t\_1 \; \mathsf{u}\_1 \; \qquad \cdots \; \qquad rs, \sigma \vdash t\_n \; \downarrow \; u\_n}{rs, \sigma \vdash \mathsf{Sconst} \; name \; \mathsf{f} \; t\_1 \; \\$s \; \ldots \; \mathsf{f} \; u\_n} \end{aligned}$$


#### **5.5 Big-Step Semantics for sterm**

This big-step semantics for sterm is not a compiler phase but moves towards the desired evaluation semantics. In this first step, we reuse the sterm type for evaluation results, instead of evaluating to the separate type value. This allows us to ignore environment capture in closures for now.

All previous −→ relations were parametrized by a rule set. Now the big-step predicate is of the form *rs*, σ t <sup>↓</sup> t where σ ::string sterm is a variable environment.

This semantics also introduces the distinction between *constructors* and *defined constants.* If C is a constructor, the term -<sup>C</sup> <sup>t</sup><sup>1</sup> ... t<sup>n</sup> is evaluated to -C t <sup>1</sup> ... t <sup>n</sup> where the t <sup>i</sup> are the results of evaluating the <sup>t</sup><sup>i</sup>.

The full set of rules is shown in Fig. 6. They deserve a short explanation:

Const. Constants are retrieved from the rule set *rs*.

Var. Variables are retrieved from the environment σ.

Abs. In order to achieve the intended invariant, abstractions are evaluated to their fully substituted form.


$$\begin{aligned} \text{Consstr} \quad \frac{(name, rhs) \in rs}{rs, \sigma \vdash \mathbf{Sconst} \ name \downarrow rhs} \quad \text{VAR} \quad \frac{\sigma \text{ name} = \mathbf{Some} \ v}{rs, \sigma \vdash \mathbf{Sars} \ name \downarrow v} \\ \text{Anss} \quad \frac{\text{Ass}}{rs, \sigma \vdash \Lambda \ ces \downarrow \ \mathsf{Vabs} \ ces \sigma} \\ \text{COMPB} \quad \frac{rs, \sigma \vdash u \downarrow v}{rs, \sigma \vdash u \downarrow v} \quad \frac{\text{first.} \sigma \vdash \text{r.t.} \ \mathsf{Vabs} \ ces \sigma'}{rs, \sigma \vdash t \ \mathsf{t} \ u \downarrow v'} \quad \text{rs.s.} \ \mathsf{r'} \ \mathsf{t} \ \mathsf{r'} \ \mathsf{t} \ \mathsf{r} \ \mathsf{t} \ \mathsf{t} \ \mathsf{v'} \\ \text{ResNetMBB} \quad \frac{\text{r.s.} \sigma \vdash t \ \mathsf{t} \ \mathsf{Vech} \ ces \sigma \ \mathrm{name } \sigma' \quad \text{c.s.} \ \mathsf{name = Some} \ ces \quad rs, \sigma \vdash u \downarrow v \\ \text{ReceCocoMBB} \quad \frac{\text{first.} \sigma \text{and} \ ces \upsilon = \mathbf{Some} \ (\sigma'', rhs) \quad rs, \sigma' \leftrightarrow \sigma' \vdash rhs \ \mathbf{t} \ \mathsf{t} \ \mathsf{v}'}{rs, \sigma \vdash t \ \mathsf{t} \ \mathsf{u} \ \mathsf{v}' \\ \text{ConstTR} \quad \frac{\text{name } \in \text{constraints}}{rs, \sigma \vdash \mathbf{Sents} \ \mathsf{t} \ \mathsf{max} \ \ \mathsf{t} \ \mathsf{t}\_{1} \ \mathsf{S} \ \mathsf{t}\_{2} \ \mathsf{t}\_{1} \ \mathsf{V} \ \mathsf{cost}$$

**Lemma 1 (Closedness invariant).** *If* σ *contains only closed terms,* frees t <sup>⊆</sup> dom σ *and rs*, σ t <sup>↓</sup> t *, then* t *is closed.*

Correctness of the big-step w.r.t. the small-step semantics is proved easily by induction on the former:

**Lemma 2.** *For any closed environment* σ *satisfying* frees t <sup>⊆</sup> dom σ*,*

$$rss, \sigma \vdash t \downarrow u \to rs \vdash \mathtt{subst} \sigma \ t \longrightarrow^\* u$$

By setting σ = [], we obtain:

**Theorem 2 (Correctness).** *rs*, [] <sup>t</sup> <sup>↓</sup> <sup>u</sup> <sup>∧</sup> closed <sup>t</sup> <sup>→</sup> *rs* <sup>t</sup> −→<sup>∗</sup> u

### **5.6 Evaluation Semantics: Refining sterm to value**

At this point, we introduce the concept of values into the semantics, while still keeping the rule set (for constants) and the environment (for variables) separate. The evaluation rules are specified in Fig. 7 and represent a departure from the original rewriting semantics: a term does not evaluate to another term but to an object of a different type, a value. We still use ↓ as notation, because big-step and evaluation semantics can be disambiguated by their types.

The evaluation model itself is fairly straightforward. As explained in Sect. 4.5, abstraction terms are evaluated to closures capturing the current variable environment. Note that at this point, recursive closures are not treated differently from non-recursive closures. In a later stage, when *rs* and σ are merged, this distinction becomes relevant.

We will now explain each rule that has changed from the previous semantics:


**Conversion Between sterm and value**. To establish a correspondence between evaluating a term to an sterm and to a value, we apply the same trick as in Sect. 5.2. Instead of specifying a complicated relation, we translate value back to sterm: simply apply the substitutions in the captured environments to the clauses.

The translation rules for Vabs and Vrecabs are kept similar to the Abs rule from the big-step semantics (Fig. 6). Roughly speaking, the big-step semantics always keeps terms fully substituted, whereas the evaluation semantics defers substitution.

Similarly to Sect. 5.2, we can also define a function sterm to value ::sterm ⇒ value and prove that one function is the inverse of the other.

**Matching.** The value type, instead of using binary function application as all other term types, uses n-ary constructor application. This introduces a conceptual mismatch between (binary) patterns and values. To make the proofs easier, we introduce an intermediate type of n-ary patterns. This intermediate type can be optimized away by fusion.

**Correctness.** The correctness proof requires a number of interesting lemmas.

**Lemma 3 (Substitution before evaluation).** *Assuming that a term* t *can be evaluated to a value* u *given a closed environment* σ*, it can be evaluated to the same value after substitution with a sub-environment* σ *. Formally: rs*, σ t <sup>↓</sup> u <sup>∧</sup> σ <sup>⊆</sup> <sup>σ</sup> <sup>→</sup> *rs*, σ subst <sup>σ</sup> <sup>t</sup> <sup>↓</sup> <sup>u</sup>

This justifies the "pre-substitution" exhibited by the Abs rule in the big-step semantics in contrast to the environment-capturing Abs rule in the evaluation semantics.

**Theorem 3 (Correctness).** *Let* σ *be a closed environment and* t *a term which only contains free variables in* dom σ*. Then, an evaluation to a value rs*, σ t <sup>↓</sup> v *can be reproduced in the big-step semantics as rs* , map value to sterm σ t <sup>↓</sup> value to sterm <sup>v</sup>*, where rs* = [(*name*, value to sterm *rhs*) <sup>|</sup> (*name*, *rhs*) <sup>←</sup> *rs*]*.*

**Instantiating the Correctness Theorem.** The correctness theorem states that, for any given evaluation of a term t with a given environment *rs*, σ containing values, we can reproduce that evaluation in the big-step semantics using a derived list of rules *rs* and an environment <sup>σ</sup> containing sterms that are generated by the value to sterm function. But recall the diagram in Fig. 2. In our scenario, we start with a given rule set of sterms (that has been compiled from a rule set of terms). Hence, the correctness theorem only deals with the opposite direction.

It remains to construct a suitable *rs* such that applying value to sterm to it yields the given sterm rule set. We can exploit the side condition (Sect. 5.1) that all bindings define functions, not constants:

**Definition 6 (Global clause set).** *The mapping* global css::string ((term<sup>×</sup> sterm) list) *is obtained by stripping the* Sabs *constructors from all definitions and converting the resulting list to a mapping.*

For each definition with name <sup>f</sup> we define a corresponding term <sup>v</sup><sup>f</sup> <sup>=</sup> Vrecabs global css <sup>f</sup> []. In other words, each function is now represented by a recursive closure bundling all functions. Applying value to sterm to <sup>v</sup><sup>f</sup> returns the original definition of <sup>f</sup>. Let *rs* denote the original sterm rule set and *rs*<sup>v</sup> the environment mapping all <sup>f</sup>'s to the <sup>v</sup><sup>f</sup> 's.

The variable environments σ and σ can safely be set to the empty mapping, because top-level terms are evaluated without any free variable bindings.

**Corollary 1 (Correctness).** *rs<sup>v</sup>*, [] <sup>t</sup> <sup>↓</sup> <sup>v</sup> <sup>→</sup> *rs*, [] <sup>t</sup> <sup>↓</sup> value to sterm <sup>v</sup>

Note that this step was not part of the compiler (although *rs*<sup>v</sup> is computable) but it is a refinement of the semantics to support a more modular correctness proof.

*Example.* Recall the odd and even example from Sect. 4.5. After compilation to sterm, the rule set looks like this:

$$\begin{aligned} rs &= \{ \left( \text{"odd"}, \mathsf{Sabs} \left[ \langle 0 \rangle \Rightarrow \left< \mathsf{Falsse} \right>, \left< \mathsf{Suc} \ n \right> \Rightarrow \left< \mathsf{even} \ n \right> \right] \right), \\ &\qquad \left( \text{"even"}, \mathsf{Sabs} \left[ \langle 0 \rangle \Rightarrow \left< \mathsf{True} \right>, \left< \mathsf{Suc} \ n \right> \Rightarrow \left< \mathsf{odd} \ n \right> \right] \right) \end{aligned}$$

This can be easily transformed into the following global clause set:

$$\begin{aligned} \mathsf{gl}\,\mathsf{bool}\,\mathsf{Less} &= \left[ \mathsf{"odd"} \mapsto \left[ \langle 0 \rangle \Rightarrow \left\langle \mathsf{Falsse} \right\rangle, \left\langle \mathsf{Suc} \, n \right\rangle \Rightarrow \left\langle \mathsf{even} \, n \right\rangle \right], \\ \mathsf{"\underline{\mathtt{ev}}\,\mathsf{en}"} &\mapsto \left[ \langle 0 \rangle \Rightarrow \left\langle \mathsf{True} \right\rangle, \left\langle \mathsf{Suc} \, n \right\rangle \Rightarrow \left\langle \mathsf{odd} \, n \right\rangle \right] \right] \end{aligned}$$

Finally, *rs*<sup>v</sup> is computed by creating a recursive closure for each function:

*rs*<sup>v</sup> = ["odd" → Vrecabs global css "odd" [], "even" → Vrecabs global css "even" []]

$$\begin{aligned} \text{ConsST} & \quad \frac{\text{name } \notin \text{Construction} \quad \sigma \text{ }name = \text{Some} \; v}{\sigma \vdash \text{Sconst} \; name \; \!| \; v} \\ \text{VAR } & \frac{\sigma \mid name = \text{Some} \; v}{\sigma \vdash \text{Svar } name \; \!| \; v} \quad \text{Ans } \frac{\sigma \vdash \text{A } cs \downarrow \text{Vabs } cs \; \sigma}{\sigma \vdash \text{A } cs \downarrow \text{Vabs } cs \; \sigma} \\ \text{ConsM3} & \frac{\sigma \vdash u \mid v}{\sigma \vdash \text{s } \mid v} \quad \frac{\text{first } \text{match } cs \; v = \text{Some} \; (\sigma'', rhs) \qquad \sigma' \vdash \text{s } \sigma'' \vdash rhs \downarrow \upsilon'}{\sigma \vdash t \; \downarrow \text{v} \; \downarrow \upsilon'} \\ & \frac{\sigma \vdash t \; \downarrow \text{Vech } cs \; cs \; name \; \sigma'}{\sigma \vdash t \; \text{mkt}\_{\text{rec}} \; \text{recv } cs \; \sigma \; \downarrow \leftarrow \sigma' \mid \; rhs \downarrow \upsilon'} \\ \text{RecConsOMB} & \frac{\sigma \vdash \text{s } \text{nkt}\_{\text{rec}} \; \text{recv } cs \; \sigma \; \downarrow \leftarrow \sigma'' \mid \; rhs \downarrow \upsilon' \\ & \sigma' \vdash t \; \text{s } \downarrow \upsilon' \end{aligned}$$
 
$$\text{ConsSTR} \xrightarrow{\text{name } \in \text{C} \; constraints \;} \quad \begin{aligned} \sigma \vdash t\_1 \mid v\_1 &\longrightarrow \\ \sigma \vdash t\_2 \mid v\_2 &\longrightarrow \text{crs } t\_2 \mid v\_2 \\ \sigma \vdash t \; \text{s } \downarrow \upsilon' \end{$$

**Fig. 8.** ML-style evaluation semantics

#### **5.7 Evaluation with Recursive Closures**

CakeML distinguishes between non-recursive and recursive closures [30]. This distinction is also present in the value type. In this step, we will conflate variables with constants which necessitates a special treatment of recursive closures. Therefore we introduce a new predicate σ t <sup>↓</sup> v in Fig. <sup>8</sup> (in contrast to the previous *rs*, σ t <sup>↓</sup> v). We examine the rules one by one:


Comb. Identical to the previous evaluation semantics.

RecComb. Almost identical to the evaluation semantics. Additionally, for each function (*name*, *cs*) <sup>∈</sup> *css*, a new recursive closure Vrecabs *css name* σ is created and inserted into the environment. This ensures that after the first call to a recursive function, the function itself is present in the environment to be called recursively, without having to introduce coinductive environments.

Constr. Identical to the evaluation semantics.

**Conflating Constants and Variables.** By merging the rule set *rs* with the variable environment σ, it becomes necessary to discuss possible clashes. Previously, the syntactic distinction between Svar and Sconst meant that x and x are not ambiguous: all semantics up to the evaluation semantics clearly specify where to look for the substitute. This is not the case in functional languages where functions and variables are not distinguished syntactically.

Instead, we rely on the fact that the initial rule set only defines constants. All variables are introduced by matching before β-reduction (that is, in the Comb and RecComb rules). The Abs rule does not change the environment. Hence it suffices to assume that variables in patterns must not overlap with constant names (see Sect. 5.1).

**Correspondence Relation.** Both constant definitions and values of variables are recorded in a single environment σ. This also applies to the environment contained in a closure. The correspondence relation thus needs to take a different sets of bindings in closures into account.

Hence, we define a relation ≈<sup>v</sup> that is implicitly parametrized on the rule set *rs* and compares environments. We call it *right-conflating*, because in a correspondence <sup>v</sup> <sup>≈</sup><sup>v</sup> <sup>u</sup>, any bound environment in <sup>u</sup> is thought to contain both variables and constants, whereas in v, any bound environment contains only variables.

**Definition 7 (Right-conflating correspondence).** *We define* ≈*<sup>v</sup> coinductively as follows:*

$$\begin{array}{c} \begin{array}{cccc} \hline \\ v\_{1} \approx\_{\text{v}} u\_{1} & \cdots & v\_{n} \approx\_{\text{v}} u\_{n} \\ \hline \end{array} \\ \begin{array}{c} \forall x \in \mathsf{free} \ cs. \ \sigma\_{1} \; x \approx\_{\text{v}} \sigma\_{2} \; x & \forall x \in \mathsf{const} \ cs. \ xs. \ xs. \ \sigma\_{2} \; x \\ \hline \end{array} \\ \begin{array}{c} \forall cs \in \mathsf{free} \ cs. \ \sigma\_{1} \; x \approx\_{\text{v}} \sigma\_{2} \; x & \forall x \in \mathsf{const} \ cs. \ xs. \ \sigma\_{2} \; x \\ \hline \end{array} \\ \begin{array}{c} \forall cs \in \mathsf{range} \ c.s. \ \forall x \in \mathsf{free} \ cs. \ \sigma\_{1} \; x \approx\_{\text{v}} \sigma\_{2} \; x \\ \hline \end{array} \\ \begin{array}{c} \forall cs \in \mathsf{range} \ c.s. \ \forall x \in \mathsf{const} \ cs. \ \sigma\_{1} \; x \approx\_{\text{v}} \left(\sigma\_{2} + \mathsf{mk}\_{\text{r}} \mathsf{rec\\_s} \ c.s \ \sigma\_{2}\right) \\ \hline \end{array} \\ \begin{array}{c} \forall \mathsf{recals} \ \varsigma \; \mathsf{x} \; \mathsf{ans} \ \sigma\_{1} \approx\_{\text{v}} \mathsf{Verc\mathsf{abs}} \ \epsilon \; \epsilon \; \mathsf{name} \ \sigma\_{2} \end{array} \end{array}$$

Consequently, ≈<sup>v</sup> is not reflexive.

**Correctness.** The correctness lemma is straightforward to state:

**Theorem 4 (Correctness).** *Let* σ *be an environment,* t *be a closed term and* <sup>v</sup> *a value such that* <sup>σ</sup> <sup>t</sup> <sup>↓</sup> <sup>v</sup>*. If for all constants* <sup>x</sup> *occurring in* <sup>t</sup>*, rs* <sup>x</sup> <sup>≈</sup><sup>v</sup> σ x *holds, then there is an* <sup>u</sup> *such that rs*, [] <sup>t</sup> <sup>↓</sup> <sup>u</sup> *and* <sup>u</sup> <sup>≈</sup><sup>v</sup> <sup>v</sup>*.*

As usual, the rather technical proof proceeds via induction over the semantics (Fig. 8). It is important to note that the global clause set construction (Sect. 5.6) satisfies the preconditions of this theorem:

**Lemma 4.** *If name is the name of a constant in rs, then*

Vrecabs global css *name* [] ≈<sup>v</sup> Vrecabs global css *name* []

Because ≈<sup>v</sup> is defined coinductively, the proof of this precondition proceeds by coinduction.

### **5.8 CakeML**

*CakeML* is a verified implementation of a subset of Standard ML [24,40]. It comprises a parser, type checker, formal semantics and backend for machine code. The semantics has been formalized in Lem [29], which allows export to Isabelle theories.

Our compiler targets CakeML's abstract syntax tree. However, we do not make use of certain CakeML features; notably mutable cells, modules, and literals. We have derived a smaller, executable version of the original CakeML semantics, called *CupCakeML*, together with an equivalence proof. The correctness proof of the last compiler phase establishes a correspondence between Cup-CakeML and the final semantics of our compiler pipeline.

For the correctness proof of the CakeML compiler, its authors have extracted the Lem specification into HOL4 theories [1]. In our work, we directly target CakeML abstract syntax trees (thereby bypassing the parser) and use its bigstep semantics, which we have extracted into Isabelle.<sup>2</sup>

**Conversion from sterm to exp.** After the series of translations described in the earlier sections, our terms are syntactically close to CakeML's terms (Cake.exp). The only remaining differences are outlined below:


**Types.** During embedding (Sect. 3), all type information is erased. Yet, CakeML performs some limited form of type checking at run-time: constructing and matching data must always be fully applied. That is, data constructors must always occur with all arguments supplied on right-hand and left-hand sides.

Fully applied constructors in terms can be easily guaranteed by simple preprocessing. For patterns however, this must be ensured throughout the compilation pipeline; it is (like other syntactic constraints) another side condition imposed on the rule set (Sect. 5.1).

<sup>2</sup> Based on a repository snapshot from March 27, 2017 (0c48672).

The shape of datatypes and constructors is managed in CakeML's environment. This particular piece of information is allowed to vary in closures, since ML supports local type definitions. Tracking this would greatly complicate our proofs. Hence, we fix a global set of constructors and enforce that all values use exactly that one.

**Correspondence Relation.** We define two different correspondence relations: One for values and one for expressions.

#### **Definition 8 (Expression correspondence)**

Var rel <sup>e</sup> (Svar n) (Cake.Var n) Const n /<sup>∈</sup> *constructors* rel <sup>e</sup> (Sconst n) (Cake.Var n) Constr <sup>n</sup> <sup>∈</sup> *constructors* rel <sup>e</sup> <sup>t</sup><sup>1</sup> <sup>u</sup><sup>1</sup> ··· rel e (Sconst *name* \$ t<sup>1</sup> \$ ... \$ t*n*) (Cake.Con (Some (Cake.Short *name*) [u1,...,u*n*])) App rel <sup>e</sup> <sup>t</sup><sup>1</sup> <sup>u</sup><sup>1</sup> rel <sup>e</sup> <sup>t</sup><sup>2</sup> <sup>u</sup><sup>2</sup> rel <sup>e</sup> <sup>t</sup><sup>1</sup> \$ <sup>t</sup><sup>2</sup> Cake.App Cake.Opapp [u1, u<sup>2</sup>] Fun n /<sup>∈</sup> ids (<sup>Λ</sup> [p<sup>1</sup> <sup>⇒</sup> <sup>t</sup>1,...]) <sup>∪</sup> *constructors* <sup>q</sup><sup>1</sup> <sup>=</sup> mk ml pat <sup>p</sup><sup>1</sup> rel <sup>e</sup> <sup>t</sup><sup>1</sup> <sup>u</sup><sup>1</sup> ··· rel <sup>e</sup> (<sup>Λ</sup> [p<sup>1</sup> <sup>⇒</sup> <sup>t</sup>1,...]) (Cake.Fun <sup>n</sup> (Cake.Mat (Cake.Var <sup>n</sup>)) [q<sup>1</sup> <sup>⇒</sup> <sup>u</sup>1,...]) Mat rel <sup>e</sup> tu q<sup>1</sup> <sup>=</sup> mk ml pat <sup>p</sup><sup>1</sup> rel <sup>e</sup> <sup>t</sup><sup>1</sup> <sup>u</sup><sup>1</sup> ··· rel <sup>e</sup> (<sup>Λ</sup> [p<sup>1</sup> <sup>⇒</sup> <sup>t</sup>1,...] \$ <sup>t</sup>) (Cake.Mat <sup>u</sup> [q<sup>1</sup> <sup>⇒</sup> <sup>u</sup>1,...])

We will explain each of the rules briefly here.

Var. Variables are directly related by identical name.


There is no separate relation for patterns, because their translation is simple.

The value correspondence (rel v) is structurally simpler. In the case of constructor values (Vconstr and Cake.Conv), arguments are compared recursively. Closures and recursive closures are compared extensionally, i.e. only bindings that occur in the body are checked recursively for correspondence.

**Correctness.** We use the same trick as in Sect. 5.6 to obtain a suitable environment for CakeML evaluation based on the rule set *rs*.

**Theorem 5 (Correctness).** *If the compiled expression* sterm to cake <sup>t</sup> *terminates with a value* u *in the CakeML semantics, there is a value* v *such that* rel <sup>v</sup> v u *and rs* <sup>t</sup> <sup>↓</sup> <sup>v</sup>*.*

### **6 Composition**

The complete compiler pipeline consists of multiple phases. Correctness is justified for each phase between intermediate semantics and correspondence relations, most of which are rather technical. Whereas the compiler may be complex and impenetrable, the trustworthiness of the constructions hinges on the obviousness of those correspondence relations.

Fortunately, under the assumption that terms to be evaluated and the resulting values do not contain abstractions – or closures, respectively – all of the correspondence relations collapse to simple structural equality: two terms are related if and only if one can be converted to the other by consistent renaming of term constructors.

The actual compiler can be characterized with two functions. Firstly, the translation of term to Cake.exp is a simple composition of each term translation function:

### **definition** term to cake :: term ⇒ Cake.exp **where**

term to cake = sterm to cake ◦ pterm to sterm ◦ nterm to pterm ◦ term to nterm

Secondly, the function that translates function definitions by composing the phases as outlined in Fig. 2, including iterated application of pattern elimination:

```
definition compile :: (term × term) fset ⇒ Cake.dec where
compile = Cake.Dletrec ◦ compile srules to cake ◦ compile prules to srules ◦
 compile irules to srules ◦ compile irules iter ◦ compile crules to irules ◦
 consts of ◦ compile rules to nrules
```
Each function compile \* corresponds to one compiler phase; the remaining functions are trivial. This produces a CakeML top-level declaration. We prove that evaluating this declaration in the top-level semantics (evaluate prog) results in an environment cake sem env. But cake sem env can also be computed via another instance of the global clause set trick (Sect. 5.6).

Equipped with these functions, we can state the final correctness theorem:

### **theorem** compiled correct:

```
(∗ If CakeML evaluation of a term succeeds ... ∗)
assumes evaluate False cake sem env s (term to cake t) (s', Rval ml v)
(∗ ... producing a constructor term without closures ... ∗)
assumes cake abstraction free ml v
(∗ ... and some syntactic properties of the involved terms hold ... ∗)
assumes closed t and ¬ shadows consts (heads rs ∪ constructors) t and
 welldefined (heads rs ∪ constructors) t and wellformed t
(∗ ... then this evaluation can be reproduced in the term−rewriting semantics ∗)
shows rs 	 t →∗ cake to term ml v
```


**Fig. 9.** Dictionary construction in Isabelle

This theorem directly relates the evaluation of a term t in the full CakeML (including mutability and exceptions) to the evaluation in the initial higher-order term rewriting semantics. The evaluation of t happens using the environment produced from the initial rule set. Hence, the theorem can be interpreted as the correctness of the pseudo-ML expression **let rec** *rs* **in** t.

Observe that in the assumption, the conversion goes from our terms to CakeML expressions, whereas in the conclusion, the conversion goes the opposite direction.

### **7 Dictionary Construction**

Isabelle's type system supports *type classes* (or simply *classes*) [18,44] whereas CakeML does not. In order to not complicate the correctness proofs, type classes are not supported by our embedded term language either. Instead, we eliminate classes and instances by a dictionary construction [19] before embedding into the term language. Haftmann and Nipkow give a pen-and-paper correctness proof of this construction [17, Sect. 4.1]. We augmented the dictionary construction with the generation of a certificate theorem that shows the equivalence of the two versions of a function, with type classes and with dictionaries. This section briefly explains our dictionary construction.

Figure 9 shows a simple example of a dictionary construction. Type variables may carry *class constraints* (e.g. α :: add). The basic idea is that classes become *dictionaries* containing the functions of that class; class instances become dictionary definitions. Dictionaries are realized as datatypes. Class constraints become additional dictionary parameters for that class. In the example, class add becomes dict add; function <sup>f</sup> is translated into <sup>f</sup> which takes an additional parameter of type dict add. In reality our tool does not produce the Isabelle source code shown in Fig. 9b but performs the constructions internally. The correctness lemma f eq is proved automatically. Its precondition expresses that the dictionary must contain exactly the function(s) of class add. For any monomorphic instance, the precondition can be proved outright based on the certificate theorems proved for each class instance as explained next.

Not shown in the example is the translation of class instances. The basic form of a class instance in Isabelle is τ ::(c<sup>1</sup>,...,cn) <sup>c</sup> where <sup>τ</sup> is an <sup>n</sup>-ary type constructor. It corresponds to Haskell's (c<sup>1</sup> <sup>α</sup><sup>1</sup>,...,c<sup>n</sup> <sup>α</sup>n) <sup>⇒</sup> <sup>c</sup> (τ α<sup>1</sup> ...αn) and is translated into a function inst <sup>c</sup> <sup>τ</sup> ::α<sup>1</sup> dict <sup>c</sup><sup>1</sup> ⇒ ··· ⇒ <sup>α</sup><sup>n</sup> dict <sup>c</sup><sup>n</sup> <sup>⇒</sup> (α<sup>1</sup>,...,αn) τ dict c and the following certificate theorem is proved:

cert <sup>c</sup><sup>1</sup> *dict*<sup>1</sup> →···→ cert <sup>c</sup><sup>n</sup> *dict*<sup>n</sup> <sup>→</sup> cert <sup>c</sup> (inst <sup>c</sup> <sup>τ</sup> *dict*<sup>1</sup> ... *dict*n)

For a more detailed explanation of how the dictionary construction works, we refer to the corresponding entry in the Archive of Formal Proofs [21].

### **8 Evaluation**

We have tried out our compiler on examples from existing Isabelle formalizations. This includes an implementation of Huffman encoding, lists and sorting, string functions [39], and various data structures from Okasaki's book [34], including binary search trees, pairing heaps, and leftist heaps. These definitions can be processed with slight modifications: functions need to be totalized (see the end of Sect. 3). However, parts of the tactics required for deep embedding proofs (Sect. 3) are too slow on some functions and hence still need to be optimized.

### **9 Conclusion**

For this paper we have concentrated on the compiler from Isabelle/HOL to CakeML abstract syntax trees. Partial correctness is proved w.r.t. the big-step semantics of CakeML. In the next step we will link our work with the compiler from CakeML to machine code. Tan *et al.* [40, Sect. 10] prove a correctness theorem that relates their semantics with the execution of the compiled machine code. In that paper, they use a newer iteration of the CakeML semantics (functional big-step [35]) than we do here. Both semantics are still present in the CakeML source repository, together with an equivalence proof. Another important step consists of targeting CakeML's native types, e.g. integer numbers and characters.

Evaluation of our compiled programs is already possible via Isabelle's predicate compiler [5], which allows us to turn CakeML's big-step semantics into an executable function. We have used this execution mechanism to establish for sample programs that they terminate successfully. We also plan to prove that our compiled programs terminate, i.e. total correctness.

The total size of this formalization, excluding theories extracted from Lem, is currently approximately 20000 lines of proof text (90 %) and ML code (10 %). The ML code itself produces relatively simple theorems, which means that there are less opportunities for it to go wrong. This constitutes an improvement over certifying approaches that prove complicated properties in ML.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Compositional Verification of Compiler Optimisations on Relaxed Memory**

Mike Dodds<sup>1</sup> , Mark Batty<sup>2</sup>, and Alexey Gotsman3(B)

<sup>1</sup> Galois Inc., Portland, Oregon, USA miked@galois.com <sup>2</sup> University of Kent, Canterbury, UK M.J.Batty@kent.ac.uk <sup>3</sup> IMDEA Software Institute, Madrid, Spain alexey.gotsman@imdea.org

**Abstract.** A valid compiler optimisation transforms a block in a program without introducing new observable behaviours to the program as a whole. Deciding which optimisations are valid can be difficult, and depends closely on the semantic model of the programming language. Axiomatic relaxed models, such as C++11, present particular challenges for determining validity, because such models allow subtle effects of a block transformation to be observed by the rest of the program. In this paper we present a denotational theory that captures optimisation validity on an axiomatic model corresponding to a fragment of C++11. Our theory allows verifying an optimisation compositionally, by considering only the block it transforms instead of the whole program. Using this property, we realise the theory in the first push-button tool that can verify real-world optimisations under an axiomatic memory model.

### **1 Introduction**

*Context and Objectives.* Any program defines a collection of observable behaviours: a sorting algorithm maps unsorted to sorted sequences, and a paint program responds to mouse clicks by updating a rendering. It is often desirable to transform a program without introducing new observable behaviours – for example, in a compiler optimisation or programmer refactoring. Such transformations are called *observational refinements*, and they ensure that properties of the original program will carry over to the transformed version. It is also desirable for transformations to be *compositional*, meaning that they can be applied to a block of code irrespective of the surrounding program context. Compositional transformations are particularly useful for automated systems such as compilers, where they are known as *peephole optimisations*.

The semantics of the language is highly significant in determining which transformations are valid, because it determines the ways that a block of code being transformed can interact with its context and thereby affect the observable behaviour of the whole program. Our work applies to a relaxed memory concurrent setting. Thus, the context of a code-block includes both code sequentially before and after the block, and code that runs in parallel. Relaxed memory means that different threads can observe different, apparently contradictory orders of events – such behaviour is permitted by programming languages to reflect CPUlevel relaxations and to allow compiler optimisations.

We focus on *axiomatic* memory models of the type used in C/C++ and Java. In axiomatic models, program executions are represented by structures of memory actions and relations on them, and program semantics is defined by a set of axioms constraining these structures. Reasoning about the correctness of program transformations on such memory models is very challenging, and indeed, compiler optimisations have been repeatedly shown unsound with respect to models they were intended to support [23,25]. The fundamental difficulty is that axiomatic models are defined in a global, non-compositional way, making it very challenging to reason compositionally about the single code-block being transformed.

*Approach.* Suppose we have a code-block B, embedded into an unknown program context. We define a *denotation* for the code-block which summarises its behaviour in a restricted representative context. The denotation consists of a set of *histories* which track interactions across the boundary between the codeblock and its context, but abstract from internal structure of the code-block. We can then validate a transformation from code-block B to B by comparing their denotations. This approach is compositional: it requires reasoning only about the code-blocks and representative contexts; the validity of the transformation in an arbitrary context will follow. It is also *fully abstract*, meaning that it can verify any valid transformation: considering only representative contexts and histories does not lose generality.

We also define a variant of our denotation that is *finite* at the cost of losing full abstraction. We achieve this by further restricting the form of contexts one needs to consider in exchange for tracking more information in histories. For example, it is unnecessary to consider executions where two context operations read from the same write.

Using this finite denotation, we implement a prototype verification tool, Stellite. Our tool converts an input transformation into a model in the Alloy language [12], and then checks that the transformation is valid using the Alloy\* solver [18]. Our tool can prove or disprove a range of introduction, elimination, and exchange compiler optimisations. Many of these were verified by hand in previous work; our tool verifies them automatically.

*Contributions.* Our contribution is twofold. First, we define the first fully abstract denotational semantics for an axiomatic relaxed model. Previous proposals in this space targeted either non-relaxed sequential consistency [6] or much more restrictive operational relaxed models [7,13,21]. Second, we show it is feasible to automatically verify relaxed-memory program transformations. Previous techniques required laborious proofs by hand or in a proof assistant [23– 27]. Our target model is derived from the C/C++ 2011 standard [22]. However, our aim is not to handle C/C++ per se (especially as the model is in flux in several respects; see Sect. 3.7). Rather we target the simplest axiomatic model rich enough to demonstrate our approach.

### **2 Observation and Transformation**

*Observational Refinement.* The notion of *observation* is crucial when determining how different programs are related. For example, observations might be I/O behaviour or writes to special variables. Given program executions X<sup>1</sup> and X2, we write X<sup>1</sup> ex X<sup>2</sup> if the observations in X<sup>1</sup> are replicated in X<sup>2</sup> (defined formally in the following). Lifting this notion, a program P<sup>1</sup> *observationally refines* another P<sup>2</sup> if every observable behaviour of one could also occur with the other – we write this P<sup>1</sup> pr P2. More formally, let -− be the map from programs to sets of executions. Then we define pr as:

$$P\_1 \nleq\_{\text{pr}} P\_2 \qquad \stackrel{\Delta}{\iff} \quad \forall X\_1 \in \{P\_1\}. \exists X\_2 \in \{P\_2\}. X\_1 \nleq\_{\text{ex}} X\_2 \tag{1}$$

*Compositional Transformation.* Many common program transformations are *compositional*: they modify a sequential fragment of the program without examining the rest of the program. We call the former the *code-block* and the latter its *context*. Contexts can include sequential code before and after the block, and concurrent code that runs in parallel with it. Code-blocks are sequential, i.e. they do not feature internal concurrency. A context C and code-block B can be composed to give a whole program C(B).

A transformation B<sup>2</sup> B<sup>1</sup> replaces some instance of the code-block B<sup>2</sup> with B1. To validate such a transformation, we must establish whether *every* whole program containing B<sup>1</sup> observationally refines the same program with B<sup>2</sup> substituted. If this holds, we say that B<sup>1</sup> observationally refines B2, written B<sup>1</sup> bl B2, defined by lifting pr as follows:

$$B\_1 \nleq\_{\mathsf{bl}} B\_2 \quad \xleftarrow{\Delta} \quad \forall C. \ C(B\_1) \preccurlyeq\_{\mathsf{pr}} C(B\_2) \tag{2}$$

If B<sup>1</sup> bl B<sup>2</sup> holds, then the compiler can replace block B<sup>2</sup> with block B<sup>1</sup> irrespective of the whole program, i.e. B<sup>2</sup> B<sup>1</sup> is a valid transformation. Thus, deciding B<sup>1</sup> bl B<sup>2</sup> is the core problem in validating compositional transformations.

The language semantics is highly significant in determining observational refinement. For example, the code blocks <sup>B</sup><sup>1</sup> : store(x,5) and <sup>B</sup><sup>2</sup> : store(x,2); store(x,5) are observationally equivalent in a sequential setting. However, in a concurrent setting the intermediate state, x = 2, can be observed in B<sup>2</sup> but not B1, meaning the code-blocks are no longer observationally equivalent. In a relaxed-memory setting there is no global state seen by all threads, which further complicates the notion of observation.

*Compositional Verification.* To establish B<sup>1</sup> bl B2, it is difficult to examine all possible syntactic contexts. Our approach is to construct a *denotation* for each code-block – a simplified, ideally finite, summary of possible interactions between the block and its context. We then define a *refinement relation* on denotations and use it to establish observational refinement. We write <sup>B</sup><sup>1</sup> <sup>B</sup><sup>2</sup> when the denotation of B<sup>1</sup> refines B2.

Refinement on denotations should be *adequate*, i.e., it should validly approximate observational refinement: <sup>B</sup><sup>1</sup> <sup>B</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>B</sup><sup>1</sup> bl <sup>B</sup>2. Hence, if <sup>B</sup><sup>1</sup> <sup>B</sup>2, then B<sup>2</sup> B<sup>1</sup> is a valid transformation. It is also desirable for the denotation to be *fully abstract*: B<sup>1</sup> bl <sup>B</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>B</sup><sup>1</sup> <sup>B</sup>2. This means any valid transformation can be verified by comparing denotations. Below we define several versions of with different properties.

### **3 Target Language and Core Memory Model**

Our language's memory model is derived from the C/C++ 2011 standard (henceforth '*C11* '), as formalised by [5,22]. However, we simplify our model in several ways; see the end of section for details. In C11 terms, our model covers releaseacquire and non-atomic operations, and sequentially consistent fences. To simplify the presentation, at first we omit non-atomics, and extend our approach to cover them in Sect. 7. Thus, all operations in this section correspond to C11's release-acquire.

### **3.1 Relaxed Memory Primer**

In a sequentially consistent concurrent system, there is a total temporal order on loads and stores, and loads take the value of the most recent store; in particular, they cannot read overwritten values, or values written in the future. A *relaxed* (or *weak*) memory model weakens this total order, allowing behaviours forbidden under sequential consistency. Two standard examples of relaxed behaviour are *store buffering (SB)* and *message passing (MP)*, shown in Fig. 1.

```
store(x,0); store(y,0);
store(x,1);
v1 := load(y);
                   store(y,1);
                   v2 := load(x);
                                           store(f,0); store(x,0);
                                       store(x,1);
                                       store(f,1);
                                                       b := load(f);
                                                       if (b == 1)
                                                         r := load(x);
```
**Fig. 1.** *Left*: store-buffering (SB) example. *Right*: message-passing (MP) example.

In most relaxed models v1 <sup>=</sup> v2 = 0 is a possible post-state for SB. This cannot occur on a sequentially consistent system: if v1 = 0, then store(y,1) must be ordered after the load of y, which would order store(x,1) before the load of x, forcing it to assign v2 = 1. In some relaxed models, b = 1 <sup>∧</sup> r = 0 is a possible post-state for MP. This is undesirable if, for example, x is a complex data-structure and f is a flag indicating it has been safely created.

### **3.2 Language Syntax**

Programs in the language we consider manipulate *thread-local variables* l,l1, l<sup>2</sup> ... <sup>∈</sup> LVar and *global variables* x, y, . . . <sup>∈</sup> GVar, coming from disjoint sets LVar and GVar. Each variable stores a value from a finite set Val and is initialised to 0 ∈ Val. Constants are encoded by special read-only thread-local variables. We assume that each thread uses the same set of thread-local variable names LVar. The syntax of the programming language is as follows:

$$\begin{array}{lcl} C & ::= \; l := E \mid \mathsf{stcore}(x, l) \mid l := \mathsf{Load}(x) \mid l := \mathsf{LL}(x) \mid l' := \mathsf{SC}(x, l) \mid \mathsf{fense} \mid l \\ & C\_1 \; \mid \; C\_2 \; | \; C\_1; C\_2 \mid \mathsf{if}\; (l) \; \{C\_1\} \; \mathsf{else} \; \{C\_2\} \mid \{\!\!\neg \} \\ \; E & ::= \; l \mid l\_1 = l\_2 \mid l\_1 \neq l\_2 \mid \ldots \\ \end{array}$$

Many of the constructs are standard. LL(x) and SC(x, l) are *load-link* and *store-conditional*, which are basic concurrency operations available on many platforms (e.g., Power and ARM). A load-link LL(x) behaves as a standard load of global variable x. However, if it is followed by a store-conditional SC(x, l), the store fails and returns false if there are intervening writes to the same location. Otherwise the store-conditional writes <sup>l</sup> and returns true. The fence command is a *sequentially consistent fence*: interleaving such fences between all statements in a program guarantees sequentially consistent behaviour. We do not include *compare-and-swap* (CAS) command in our language because LL-SC is more general [2]. Hardware-level LL-SC is used to implement C11 CAS on Power and ARM. Our language does not include loops because our model in this paper does not include infinite computations (see Sect. 3.7 for discussion). As a result, loops can be represented by their finite unrollings. Our load commands write into a local variable. In examples, we sometimes use 'bare' loads without a variable write.

The construct {−} represents a block-shaped hole in the program. To simplify our presentation, we assume that at most one hole appears in the program. Transformations that apply to multiple blocks at once can be simulated by using the fact our approach is compositional: transformations can be applied in sequence using different divisions of the program into code-block and context.

The set Prog of *whole programs* consists of programs without holes, while the set Contx of *contexts* consists of programs with a hole. The set Block of *code-blocks* are whole programs without parallel composition. We often write <sup>P</sup> <sup>∈</sup> Prog for a whole program, <sup>B</sup> <sup>∈</sup> Block for a code-block, and <sup>C</sup> <sup>∈</sup> Contx for a context. Given a context C and a code-block B, the composition C(B) is C with its hole syntactically replaced by B. For example:

$$\begin{array}{ccc} C \colon \mathsf{Load(x)} ; & \{\text{-}\} ; \ \mathsf{store(y,11)} , & B \colon \mathsf{store(x,2)}\\ \longrightarrow & C(B) \colon \mathsf{Load(x)} ; \ \mathsf{store(x,2)} ; \ \mathsf{store(y,11)} \end{array}$$

We restrict Prog, Contx and Block to ensure LL-SC pairs are matched correctly. Each SC must be preceded in program order by a LL to the same location. Other types of operations may occur between the LL and SC, but intervening SC operations are forbidden. For example, the program LL(x); SC(x,v1); SC(x,v2); is forbidden. We also forbid LL-SC pairs from spanning parallel compositions, and from spanning the block/context boundary.

### **3.3 Memory Model Structure**

The semantics of a whole program P is given by a set -P of *executions*, which consist of *actions*, representing memory events on global variables, and several relations on these. Actions are tuples in the set Action <sup>Δ</sup> = ActID × Kind × Option(GVar) <sup>×</sup> Val∗. In an action (a, k, z, b) <sup>∈</sup> Action: <sup>a</sup> <sup>∈</sup> ActID is the unique action identifier; <sup>k</sup> <sup>∈</sup> Kind is the kind of action – we use load, store, LL, SC, and the failed variant SC<sup>f</sup> in the semantics, and will introduce further kinds as needed; <sup>z</sup> <sup>∈</sup> Option(GVar) is an option type consisting of either a single global variable Just(x) or None; and <sup>b</sup> <sup>∈</sup> Val<sup>∗</sup> is the vector of values (actions with multiple values are used in Sect. 4).

Given an action v, we use gvar(v) and val(v) as selectors for the different fields. We often write actions so as to elide action identifiers and the option type. For example, load(x, 3) stands for <sup>∃</sup>i.(i, load, Just(x), [3]). We also sometimes elide values. We call load and LL actions *reads*, and store and successful SC actions *writes*. Given a set of actions A, we write, e.g., reads(A) to identify read actions in <sup>A</sup>. Below, we range over all actions by u, v; read actions by <sup>r</sup>; write actions by w; and LL, SC actions by *ll* and *sc* respectively.

$$\begin{array}{lcl} l := \mathsf{load}(x), \sigma \rightleftharpoons & \{ (\{\mathsf{load}(x,a)\}, \emptyset, \sigma[l \mapsto a]) \mid a \in \mathsf{Val} \} \\ \langle \mathsf{stcore}(x,l), \sigma \rangle \stackrel{\Delta}{=} & \{ (\{\mathsf{stcore}(x,a)\}, \emptyset, \sigma) \mid \sigma(l) = a \} \\ & \langle C\_1; C\_2, \sigma \rangle \stackrel{\Delta}{=} & \{ (\mathcal{A}\_1 \cup \mathcal{A}\_2, \mathsf{sb}\_1 \cup \mathsf{sb}\_2 \cup (\mathcal{A}\_1 \times \mathcal{A}\_2), \sigma\_2) \mid \\ & & (\mathcal{A}\_1, \mathsf{sb}\_1, \sigma\_1) \in \langle C\_1, \sigma \rangle \land (\mathcal{A}\_2, \mathsf{sb}\_2, \sigma\_2) \in \langle C\_2, \sigma\_1 \rangle \} \\ & \langle \mathsf{tcore}, \sigma \rangle \stackrel{\Delta}{=} & \{ (\{ll, sc\}, \{(ll, sc)\}, \sigma) \mid ll = \mathsf{LL}\{\mathsf{f}en, 0\} \land sc = \mathsf{SC}\{\mathsf{f}en, 0\} \} \end{array}$$

**Fig. 2.** Selected clauses of the thread-local semantics. The full semantics is given in [10, Sect. A]. We write A<sup>1</sup> ∪ A· <sup>2</sup> for a union that is defined only when actions in A<sup>1</sup> and A<sup>2</sup> use disjoint sets of identifiers. We omit identifiers from actions to avoid clutter.

The semantics of a program <sup>P</sup> <sup>∈</sup> Prog is defined in two stages. First, a *threadlocal semantics* of <sup>P</sup> produces a set <sup>P</sup> of *pre-executions* (A,sb) <sup>∈</sup> PreExec. A pre-execution contains a finite set of memory actions A ⊆ Action that could be produced by the program. It has a transitive and irreflexive *sequence-before* relation sb ⊆A×A, which defines the sequential order imposed by the program syntax.

For example two sequential statements in the same thread produce actions ordered in sb. The thread-local semantics takes into account control flow in P's threads and operations on local variables. However, it does not constrain the behaviour of global variables: the values threads read from them are chosen arbitrarily. This is addressed by extending pre-executions with extra relations, and filtering the resulting *executions* using *validity axioms*.

#### **3.4 Thread-Local Semantics**

The thread-local semantics is defined formally in Fig. 2. The semantics of a program <sup>P</sup> <sup>∈</sup> Prog is defined using function <sup>−</sup>, <sup>−</sup>: Prog×VMap → P(PreExec<sup>×</sup> VMap). The values of local variables are tracked by a map <sup>σ</sup> <sup>∈</sup> VMap <sup>Δ</sup> = LVar → Val. Given a program and an input local variable map, the function produces a set of pre-executions paired with an output variable map, representing the values of local variables at the end of the execution. Let σ<sup>0</sup> map every local variable to 0. Then <sup>P</sup> , the thread-local semantics of a program <sup>P</sup>, is defined as

$$\langle P \rangle \quad \stackrel{\Delta}{=} \quad \{ (\mathcal{A}, \mathbf{sb}) \mid \exists \sigma'. (\mathcal{A}, \mathbf{sb}, \sigma') \in \langle P, \sigma\_0 \rangle \}$$

The significant property of the thread-local semantics is that it does not restrict the behaviour of global variables. For this reason, note that the clause for load in Fig. <sup>2</sup> leaves the value <sup>a</sup> unrestricted. We follow [16] in encoding the fence command by a successful LL-SC pair to a distinguished variable *fen* <sup>∈</sup> GVar that is not otherwise read or written.

#### **3.5 Execution Structure and Validity Axioms**

The semantics of a program P is a set -P of *executions* X = (A,sb, at,rf, mo, hb) <sup>∈</sup> Exec, where (A,sb) is a pre-execution and at,rf, mo, hb <sup>⊆</sup> A×A. Given an execution <sup>X</sup> we sometimes write <sup>A</sup>(X),sb(X),... as selectors for the appropriate set or relation. The relations have the following purposes.


The semantics -<sup>P</sup> of a program <sup>P</sup> is the set of executions <sup>X</sup> <sup>∈</sup> Exec compatible with the thread-local semantics and the *validity axioms*, denoted valid(X):

$$\mathbb{E}\left[P\right] \stackrel{\Delta}{=} \{ X \mid (\mathcal{A}(X), \mathsf{sb}(X)) \in \langle P \rangle \land \mathsf{valid}(X) \}\tag{3}$$

The validity axioms on an execution (A,sb, at,rf, mo, hb) are:

– HBdef: hb = (sb <sup>∪</sup> rf)<sup>+</sup> *and* hb is acyclic.

This axiom defines hb and enforces the intuitive property that there are no cycles in the temporal order. It also prevents an action reading from its hbfuture: as rf is included in hb, this would result in a cycle.

– HBvsMO: ¬∃w1, w2. <sup>w</sup><sup>1</sup> hb - w2 mo 

This axiom requires that the order in which writes to a location become visible to threads cannot contradict the temporal order. But take note that writes may be ordered in mo but not hb.

$$\vdash \text{COMPRENCE: } \neg \exists w\_1, w\_2, r. \ w\_1 \xrightarrow{\scriptstyle \mathsf{mo}} \underbrace{w\_2 \xrightarrow{\scriptstyle \mathsf{hb}} r}\_{\mathsf{ft}} r.$$

This axiom generalises the sequentially consistent prohibition on reading overwritten values. If two writes are ordered in mo, then intuitively the second overwrites the first. A read that follows some write in hb or mo cannot read from writes earlier in mo – these earlier writes have been overwritten. However, unlike in sequential consistency, hb is partial, so there may be multiple writes that an action can legally read.

– RFval: <sup>∀</sup>r.(¬∃w . w rf −→ <sup>r</sup>) =<sup>⇒</sup> (val(r)=0<sup>∧</sup> (¬∃w. w hb −→ <sup>r</sup> <sup>∧</sup> gvar(w) = gvar(r)))

Most reads must take their value from a write, represented by an rf edge. However, the RFval axiom allows the rf edge to be omitted if the read takes the initial value 0 and there is no hb-earlier write to the same location. Intuitively, an hb-earlier write would supersede the initial value in a similar way to Coherence.

$$\begin{array}{c} \text{--- } \text{ATOM: } \neg \exists w\_1, w\_2, \text{ } ll, sc. \quad w\_1 \xrightarrow{\mathsf{mo}} w\_2\\ \text{rf} \Big\downarrow \\ \text{ $ll$ } \xrightarrow{\mathsf{at}} \text{sc} \end{array} \Big\downarrow \begin{array}{c} \text{mo} \\ \text{ $ \text{$  \text{ $ \text{$  \text{ $ \text{$  \text{ $ \text{$  \text{ $ \text{$  \text{ $ \text{$  \text{ $ \text{$ }}}}}}} \text{sc}} \end{array}}$$

This axiom is adapted from [16]. For an LL-SC pair *ll* and *sc*, it ensures that there is no mo-intervening write w<sup>2</sup> that would invalidate the store.

Our model forbids the problematic relaxed behaviour of the message-passing (MP) program in Fig. <sup>1</sup> that yields b = 1 <sup>∧</sup> r = 0. Figure 3 shows an (invalid) execution that would exhibit this behaviour. To avoid clutter, here and in the following we omit hb edges obtained by transitivity and local variable values. This execution is allowed by the threadlocal semantics of the MP program, but it is ruled out by the Coherence validity axiom. As hb is transitively closed, there is a derived hb edge store(x, 1) hb −→ load(x, 0), which forms <sup>a</sup> Coherence violation. Thus, this is not an execution of the MP program. Indeed, any

**Fig. 3.** An invalid execution of MP.

execution ending in load(x, 0) is forbidden for the same reason, meaning that the MP relaxed behaviour cannot occur.

### **3.6 Relaxed Observations**

Finally, we define a notion of observational refinement suitable for our relaxed model. We assume a subset of *observable* global variables, OVar ⊆ GVar, which can only be accessed by the context and not by the code-block. We consider the actions and the hb relation on these variables to be the observations. We write <sup>X</sup>|OVar for the projection of <sup>X</sup>'s action set and relations to OVar, and use this to define ex for our model:

$$X \nvdash\_{\mathsf{lex}} Y \quad \xleftarrow{\Delta} \quad \mathcal{A}(X|\_{\mathsf{OVar}}) = \mathcal{A}(Y|\_{\mathsf{OVar}}) \land \mathsf{hb}(Y|\_{\mathsf{OVar}}) \subseteq \mathsf{hb}(X|\_{\mathsf{OVar}})$$

This is lifted to programs and blocks as in Sect. 2, def. (1) and (2). Note that in the more abstract execution, actions on observable variables must be the same, but hb can be weaker. This is because we interpret hb as a constraint on time order: two actions that are unordered in hb could have occurred in either order, or in parallel. Thus, weakening hb allows more observable behaviours (see Sect. 2).

### **3.7 Differences from C11**

Our language's memory model is derived from the C11 formalisation in [5], with a number of simplifications. We chose C11 because it demonstrates most of the important features of axiomatic language models. However, we do not target the precise C11 model: rather we target an abstracted model that is rich enough to demonstrate our approach. Relaxed language semantics is still a very active topic of research, and several C11 features are known to be significantly flawed, with multiple competing fixes proposed. Some of our differences from [5] are intended to avoid such problematic features so that we can cleanly demonstrate our approach.

In C11 terms, our model covers release-acquire and non-atomic operations (the latter addressed in Sect. 7), and sequentially consistent fences. We deviate from C11 in the following ways:


### **4 Denotations of Code-Blocks**

We construct the denotation for a code-block in two steps: (1) generate the *block-local* executions under a set of special cut-down contexts; (2) from each execution, extract a summary of interactions between the code-block and the context called a *history*.

### **4.1 Block-Local Executions**

The block-local executions of a block <sup>B</sup> <sup>∈</sup> Block omit context structure such as syntax and actions on variables not accessed in the block. Instead the context is represented by special actions call and ret, a set <sup>A</sup>B, and relations <sup>R</sup><sup>B</sup> and <sup>S</sup>B, each covering an aspect of the interaction of the block and an arbitrary unrestricted context. Together, each choice of call, ret, <sup>A</sup>B, <sup>R</sup>B, and <sup>S</sup><sup>B</sup> abstractly represents a set of possible syntactic contexts. By quantifying over the possible values of these parameters, we cover the behaviour of *all* syntactic contexts. The parameters are defined as follows:


– *Context happens-before.* The context can generate hb edges between its actions, which affect the behaviour of the block. We track these effects with a relation <sup>R</sup><sup>B</sup> over actions in <sup>A</sup>B, call and ret:

$$R\_B \subseteq (\mathcal{A}\_B \times \mathcal{A}\_B) \cup (\mathcal{A}\_B \times \{\textsf{call}\}) \cup (\{\textsf{ret}\} \times \mathcal{A}\_B) \tag{4}$$

The context can generate hb edges between actions directly if they are on the same thread, or indirectly through inter-thread reads. Likewise call/ret may be related to context actions on the same or different threads.

– *Context atomicity.* The context can generate at edges between its actions that we capture in the relation <sup>S</sup><sup>B</sup> ⊆ A<sup>B</sup> × AB. We require this relation to be an injective function from LL to SC actions. We consider only cases where LL/SC pairs do not cross block boundaries, so we need not consider boundary-crossing at edges.

Together, call, ret, <sup>A</sup>B, <sup>R</sup>B, and <sup>S</sup><sup>B</sup> represent a limited context, stripped of syntax, relations sb, mo, and rf, and actions on global variables other than VSB. When constructing block-local executions, we represent all possible interactions by quantifying over all possible choices of σ, σ , <sup>A</sup>B, <sup>R</sup><sup>B</sup> and <sup>S</sup>B. The set -B, <sup>A</sup>B, RB, SB contains all executions of <sup>B</sup> under this special limited context. Formally, an execution <sup>X</sup> = (A,sb, at,rf, mo, hb) is in this set if:


We say that <sup>A</sup>B, <sup>R</sup><sup>B</sup> and <sup>S</sup><sup>B</sup> are *consistent with* <sup>B</sup> if they act over variables in the set VSB. In the rest of the paper we only consider consistent choices of <sup>A</sup>B, <sup>R</sup>B, <sup>S</sup>B. The *block-local executions* of <sup>B</sup> are then all executions <sup>X</sup> <sup>∈</sup> -B, <sup>A</sup>B, RB, SB. 1

<sup>1</sup> This definition relies on the fact that our language supports a fixed set of global variables, not dynamically allocated addressable memory (see Sect. 3.7). We believe that in the future our results can be extended to support dynamic memory. For this, the block-local construction would need to quantify over actions on all possible memory locations, not just the static variable set VS*B*. The rest of our theory would remain the same, because C11-style models grant no special status to pointer values. Cutting down to a finite denotation, as in Sect. 5 below, would require some extra abstraction over memory – for example, a separation logic domain such as [9].

**Fig. 4.** *Left*: block-local execution. *Right*: corresponding history.

*Example Block-Local Execution.* The left of Fig. 4 shows a block-local execution for the code-block

$$11 \coloneqq \mathbf{1} \mathbf{a} \mathbf{a}(\mathbf{f}); 12 \coloneqq \mathbf{1} \mathbf{a} \mathbf{a}(\mathbf{x}) \tag{5}$$

Here the set VS<sup>B</sup> of accessed global variables is {f, <sup>x</sup>}, As before, we omit local variables to avoid clutter. The context action set A<sup>B</sup> consists of the three stores, and R<sup>B</sup> is denoted by dotted edges.

In this execution, both <sup>A</sup><sup>B</sup> and <sup>R</sup><sup>B</sup> affect the behaviour of the code-block. The following path is generated by <sup>R</sup><sup>B</sup> and the load of <sup>f</sup> = 1:

$$\mathsf{I}\ \mathsf{store}(\mathbf{x},2) \xrightarrow{\mathsf{mo}} \mathsf{store}(\mathbf{x},1) \xrightarrow{R\_B} \mathsf{store}(\mathbf{f},1) \xrightarrow{\mathsf{ref}} \mathsf{bool}(\mathbf{f},1) \xrightarrow{\mathsf{sub}} \mathsf{bool}(\mathbf{x},1)$$

Because hb includes sb, rf, and <sup>R</sup>B, there is a transitive edge store(x, 1) hb −→ load(x, 1). The edge store(x, 2) mo −−→ store(x, 1) is forced because the HBvsMO axiom prohibits mo from contradicting hb. Consequently, the Coherence axiom forces the code-block to read x = 1.

### **4.2 Histories**

From any block-local execution X, its *history* summarises the interactions between the code-block and the context. Informally, the history records hb over context actions, call, and ret. More formally the history, written hist(X), is a pair (A, G) consisting of an action set <sup>A</sup> and *guarantee relation* <sup>G</sup> ⊆A×A. Recall that we use contx(X) to denote the set of context actions in X. Using this, we define the history as follows:


$$(\mathsf{contx}(X) \times \mathsf{contx}(X)) \cup (\mathsf{contx}(X) \times \{\mathsf{ret}\}) \cup (\{\mathsf{call}\} \times \mathsf{contx}(X))\tag{6}$$

The guarantee summarises the code-block's effect on its context: it suffices to only track hb and ignore other relations. Note the guarantee definition is similar to the context relation RB, definition (4). The difference is that call and ret are

**Fig. 5.** Executions and histories illustrating the guarantee relation.

switched: this is because the guarantee represents hb edges generated by the code-block, while R<sup>B</sup> represents the edges generated by the context. The right of Fig. 4 shows the history corresponding to the block-local execution on the left.

To see the interactions captured by the guarantee, compare the block given in def. (5) with the block l2:=load(x). These blocks have differing effects on the following syntactic context:

```
store(y,1); store(y,2); store(f,1) || {-}; l3:=load(y)
```
For the two-load block embedded into this context, l1 = 1 <sup>∧</sup> l3 = 1 is not a possible post-state. For the single-load block, this post-state is permitted.<sup>2</sup>

In Fig. 5, we give executions for both blocks embedded into this context. We draw the context actions that are not included into the history in grey. In these executions, the code block determines whether the load of y can read value 1 (represented by the edge labelled 'rf?'). In the first execution, the context load of y cannot read 1 because there is the path store(y, 1) mo −−→ store(y, 2) hb −→ load(y) which would contradict the Coherence axiom. In the second execution there is no such path and the load may read 1.

It is desirable for our denotation to hide the precise operations inside the block – this lets it relate syntactically distinct blocks. Nonetheless, the history must record hb effects such as those above that are visible to the context. In Execution 1, the Coherence violation is still visible if we only consider context operations, call, ret, and the guarantee G – i.e. the history. In Execution 2, the fact that the read is permitted is likewise visible from examining the history. Thus the guarantee, combined with the local variable post-states, capture the effect of the block on the context without recording the actions inside the block.

<sup>2</sup> We choose these post-states for exposition purposes – in fact these blocks are also distinguishable through local variable l1 alone.

### **4.3 Comparing Denotations**

The denotation of a code-block B is the set of histories of block-local executions of B under each possible context, i.e. the set

$$\{\mathsf{hist}(X) \mid \exists \mathcal{A}\_B, R\_B, S\_B. X \in [B, \mathcal{A}\_B, R\_B, S\_B] \}$$

To compare the denotations of two code-blocks, we first define a *refinement relation* on histories: (A1, G1) <sup>h</sup> (A2, G2) holds iff <sup>A</sup><sup>1</sup> <sup>=</sup> <sup>A</sup><sup>2</sup> <sup>∧</sup> <sup>G</sup><sup>2</sup> <sup>⊆</sup> <sup>G</sup>1. The history (A2, G2) places fewer restrictions on the context than (A1, G1) – a weaker guarantee corresponds to more observable behaviours. For example in Fig. 5, *History 1* <sup>h</sup> *History 2* but not vice versa, which reflects the fact that History 1 rules out the read pattern discussed above.

We write <sup>B</sup><sup>1</sup> <sup>q</sup> <sup>B</sup><sup>2</sup> to state that the denotation of <sup>B</sup><sup>1</sup> *refines* that of <sup>B</sup>2. The subscript 'q' stands for the fact we *quantify* over both <sup>A</sup> and <sup>R</sup>B. We define <sup>q</sup> by lifting h:

$$\begin{array}{rcl}B\_1 \sqsubseteq\_{\mathsf{q}} B\_2 & \stackrel{\Delta}{\iff} & \forall \mathcal{A}, R, S. \forall X\_1 \in \left[B\_1, \mathcal{A}, R, S\right]. \\ & & \exists X\_2 \in \left[B\_2, \mathcal{A}, R, S\right]. \mathsf{hist}(X\_1) \sqsubseteq\_{\mathsf{h}} \mathsf{hist}(X\_2) \end{array} \tag{7}$$

In other words, two code-blocks are related <sup>B</sup><sup>1</sup> <sup>q</sup> <sup>B</sup><sup>2</sup> if for every block-local execution of B1, there is a corresponding execution of B<sup>2</sup> with a related history. Note that the corresponding history must be constructed under the same cutdown context <sup>A</sup>, R, S.

**Theorem 1** (Adequacy of q)**.** <sup>B</sup><sup>1</sup> <sup>q</sup> <sup>B</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>B</sup><sup>1</sup> bl B2.

**Theorem 2** (Full abstraction of q)**.** <sup>B</sup><sup>1</sup> bl <sup>B</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>B</sup><sup>1</sup> <sup>q</sup> <sup>B</sup>2.

As a corollary of the above theorems, a program transformation B<sup>2</sup> B<sup>1</sup> is valid if and only if <sup>B</sup><sup>1</sup> <sup>q</sup> <sup>B</sup><sup>2</sup> holds. We prove Theorem <sup>1</sup> in [10, Sect. B]. We give a proof sketch of Theorem 2 in Sect. 8 and a full proof in [10, Sect. F].

**Fig. 6.** History comparison for an example program transformation.

#### **4.4 Example Transformation**

We now consider how our approach applies to a simple program transformation:

<sup>B</sup><sup>2</sup> : store(x,l1); store(x,l1) <sup>B</sup><sup>1</sup> : store(x,l1)

To verify this transformation, we must show that <sup>B</sup><sup>1</sup> <sup>q</sup> <sup>B</sup>2. To do this, we must consider the unboundedly many block-local executions. Here we just illustrate the reasoning for a single block-local execution; in Sect. 5 below we define a context reduction which lets us consider a finite set of such executions.

In Fig. 6, we illustrate the necessary reasoning for an execution <sup>X</sup><sup>1</sup> <sup>∈</sup> -<sup>B</sup>1, <sup>A</sup>, R, S, with a context action set <sup>A</sup> consisting of a single load x = 1, a context relation R relating ret to the load, and an empty S relation. This choice of R forces the context load to read from the store in the block. We can exhibit an execution <sup>X</sup><sup>2</sup> <sup>∈</sup> -<sup>B</sup>2, <sup>A</sup>, R, S with a matching history by making the context load read from the final store in the block.

### **5 A Finite Denotation**

The approach above simplifies contexts by removing syntax and non-hb structure, but there are still infinitely many <sup>A</sup>/R/S contexts for any code-block. To solve this, we introduce a type of context reduction which allows us to consider only finitely many block-local executions. This means that we can automatically check transformations by examining all such executions. However this 'cut down' approach is no longer fully abstract. We modify our denotation as follows:


These two steps are both necessary to achieve finiteness. Removing the R relation reduces the amount of structure in the context. This makes it possible to then remove redundant patterns – for example, duplicate reads from the same write.

Before defining the two steps in detail, we give the structure of our modified refinement c. In the definition, histE(X) stands for the *extended history* of an execution <sup>X</sup>, and <sup>E</sup> for refinement on extended histories.

$$\begin{aligned} \exists B\_1 \sqsubseteq\_{\mathsf{c}} B\_2 &\iff \forall \mathcal{A}, S. \forall X\_1 \in \{B\_1, \mathcal{A}, \emptyset, S\}.\\ \mathsf{cut}(X\_1) &\implies \exists X\_2 \in \{B\_2, \mathcal{A}, \emptyset, S\}. \mathsf{hist}\_{\mathsf{E}}(X\_1) \sqsubseteq\_{\mathsf{E}} \mathsf{hist}\_{\mathsf{E}}(X\_2) (8) \end{aligned}$$

As with <sup>q</sup> above, the refinement <sup>c</sup> is adequate. However, it is not fully abstract (we provide a counterexample in [10, Sect. D]). We prove the following theorem in [10, Sect. E].

**Theorem 3** (Adequacy of c)**.** <sup>B</sup><sup>1</sup> <sup>c</sup> <sup>B</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>B</sup><sup>1</sup> bl B2.

### **5.1 Cutting Predicate**

Removing the context relation R in definition (8) removes a large amount of structure from the context. However, there are still unboundedly many blocklocal executions with an empty R – for example, we can have an unbounded number of reads and writes that do not interact with the block. The cutting predicate identifies these redundant executions.

We first identify the actions in a block-local execution that are *visible*, meaning they directly interact with the block. We write code(X) for the set of actions in X generated by the code-block. Visible actions belong to code(X), read from code(X), or are read by code(X). In other words,

$$\mathsf{vis}(X) \triangleq \mathsf{code}(X) \cup \{ u \mid \exists v \in \mathsf{code}(X). \, u \xrightarrow{\mathsf{rt}} v \lor v \xrightarrow{\mathsf{rt}} u \},$$

Informally, cutting eliminates three redundant patterns: *(i)* non-visible context reads, i.e. reads from context writes; *(ii)* duplicate context reads from the same write; and *(iii)* duplicate non-visible writes that are not separated in mo by a visible write. Formally we define cut (X), the conjunction of cutR for read, and cutW for write.

$$\begin{array}{c} \mathsf{cut} \mathsf{ort} \mathsf{R}(X) \stackrel{\scriptstyle \Delta}{\iff} \mathsf{reads}(X) \subseteq \mathsf{vis}(X) \wedge\\ \qquad \qquad \qquad \qquad \forall r\_1, r\_2 \in \mathsf{contx}(X). (r\_1 \neq r\_2 \Rightarrow \neg \exists w. w \stackrel{\scriptstyle \mathsf{f}}{\iff} r\_1 \wedge w \stackrel{\scriptstyle \mathsf{f}}{\longrightarrow} r\_2),\\ \mathsf{cut} \mathsf{W}(X) \stackrel{\scriptstyle \Delta}{\iff} \forall w\_1, w\_2 \in (\mathsf{contx}(X) \mid \mathsf{vis}(X)).\\ \qquad \qquad w\_1 \stackrel{\scriptstyle \mathsf{mo}}{\longrightarrow} w\_2 \Rightarrow \exists w\_3 \in \mathsf{vis}(X). w\_1 \stackrel{\scriptstyle \mathsf{mo}}{\longrightarrow} w\_3 \stackrel{\scriptstyle \mathsf{mo}}{\longrightarrow} w\_2\\ \mathsf{cut}'(X) \stackrel{\scriptstyle \Delta}{\iff} \mathsf{cut} \mathsf{R}(X) \wedge \mathsf{cut} \mathsf{W}(X) \end{array}$$

The final predicate cut(X) extends this in order to keep LL-SC pairs together: it requires that, if cut () permits one half of an LL-SC, the other is also permitted implicitly (for brevity we omit the formal definition of cut() in terms of cut ).


**Fig. 7.** *Left*: block-local execution which includes patterns forbidden by cut(). *Right*: key explaining the patterns forbidden or allowed.

It should be intuitively clear why the first two of the above patterns are redundant. The main surprise is the third pattern, which preserves some nonvisible writes. This is required by Theorem 3 for technical reasons connected to per-location coherence. We illustrate the application of cut() to a block-local execution in Fig. 7.

### **5.2 Extended History (histE)**

In our approach, each block-local execution represents a pattern of interaction between block and context. In our previous definition of q, constraints imposed by the block are captured by the guarantee, while constraints imposed by the context are captured by the <sup>R</sup> relation. The definition (8) of <sup>c</sup> removes the context relation R, but these constraints must still be represented. Instead, we replace R with a history component called a *deny*. This simplifies the block-local executions, but compensates by recording more in the denotation.

The deny records the hb edges that *cannot* be enforced due to the execution structure. For example, consider the block-local execution<sup>3</sup> of Fig. 8.

This pattern could not occur in a context that generates the dashed edge D as a hb – to do so would violate the HBvsMO axiom. In our previous definition of q, we explicitly represented the presence or absence of this edge through the R relation. In our new formulation, we represent such 'forbidden' edges in the history by a deny edge.

**Fig. 8.** A deny edge.

r

The *extended history* of an execution X, written

histE(X) is a triple (A, G, D), consisting of the familiar notions of action set <sup>A</sup> and guarantee <sup>G</sup> ⊆A×A, together with deny <sup>D</sup> ⊆A×A as defined below:

$$D \stackrel{\Delta}{=} \{(u, v) \mid \mathsf{H}\mathsf{B}\mathsf{v}\mathsf{M}\mathsf{O}\mathsf{-d}(u, v) \lor \mathsf{C}\mathsf{o}\mathsf{here-d}(u, v) \lor \mathsf{R}\mathsf{F}\mathsf{val}\mathsf{-d}(u, v)\} \mid \mathsf{D}$$
 
$$\left( (\mathsf{contx}(X) \times \mathsf{contx}(X)) \cup (\mathsf{contx}(X) \times \{\mathsf{call}\}) \right) \cup \left( \{\mathsf{ret}\} \times \mathsf{contx}(X) \right) \mid$$

Each of the predicates HBvsMO-d, Cohere-d, and RFval-d generates the deny for one validity axiom. In the diagrammatic definitions below, dashed edges represent the deny edge, and hb<sup>∗</sup> is the reflexive-transitive closure of hb:

$$\begin{aligned} \mathsf{HBSsMO} \mathsf{d} \mathsf{d} (u, v) & \quad \exists w\_1, w\_2. \; w\_1 \xleftarrow{\mathsf{hb}^\*} \mathsf{u} \xleftarrow{\mathsf{hb}^\*} \mathsf{v} \xrightarrow{\mathsf{hb}^\*} w\_2 \\ \mathsf{Cohereence-d} (u, v) & \quad w\_1 \xleftarrow{\mathsf{mo}} \mathsf{w}\_2 \xleftarrow{\mathsf{hb}^\*} \mathsf{u} \xleftarrow{\mathsf{hb}^\*} \mathsf{v} \xrightarrow{\mathsf{hb}^\*} r \\ \mathsf{RFval-d} (u, v) & \quad \exists w, r. \; \mathsf{gvar}(w) = \mathsf{gvar}(r) \land \\ & \quad \neg \exists w'. w' \stackrel{\mathsf{rf}}{\to} r \land w \xleftarrow{\mathsf{hb}^\*} \mathsf{u} \xleftarrow{\mathsf{hb}^\*} \mathsf{v} \xrightarrow{\mathsf{hb}^\*} r \end{aligned}$$

<sup>3</sup> We use this execution for illustration, but in fact the cut() predicate would forbid the load.

One can think of a deny edge as an 'almost' violation of an axiom. For example, if HBvsMO-d(u, v) holds, then the context cannot generate an extra hb-edge u hb −→ <sup>v</sup> – to do so would violate HBvsMO.

Because deny edges represent constraints on the context, weakening the deny places fewer constraints, allowing more behaviours, so we compare them with relational inclusion:

$$(\mathcal{A}\_2, G\_2, D\_2) \subseteq\_{\mathsf{E}} (\mathcal{A}\_2, G\_2, D\_2) \iff \mathcal{A}\_1 = \mathcal{A}\_2 \land G\_2 \subseteq G\_1 \land D\_2 \subseteq D\_1)$$

This refinement on extended histories is used to define our refinement relation on blocks, c, def. (8).

### **5.3 Finiteness**

**Theorem 4** (Finiteness)**.** If for a block <sup>B</sup> and state <sup>σ</sup> the set of thread-local executions B, σ is finite, then so is the set of resulting block-local executions, {<sup>X</sup> | ∃A, S. X <sup>∈</sup> -B, <sup>A</sup>, <sup>∅</sup>, S <sup>∧</sup> cut(X)}.

*Proof (sketch).* It is easy to see for a given thread-local execution there are finitely many possible visible reads and writes. Any two non-visible writes must be distinguished by at least one visible write, limiting their number.

Theorem 4 means that any transformation can be checked automatically if the two blocks have finite sets of thread-local executions. We assume a finite data domain, meaning action can only take finitely many distinct values in Val. Recall also that our language does not include loops. Given these facts, any transformations written in our language will satisfy finiteness, and can therefore by automatically checked.

### **6 Prototype Verification Tool**

Stellite is our prototype tool that verifies transformations using the Alloy\* model checker [12,18]. Our tool takes an input transformation B<sup>2</sup> B<sup>1</sup> written in a C-like syntax. It automatically converts the transformation into an Alloy\* model encoding <sup>B</sup><sup>1</sup> <sup>c</sup> <sup>B</sup>2. If the tool reports success, then the transformation is verified for unboundedly large syntactic contexts and executions.

An Alloy model consists of a collection of predicates on relations, and an instance of the model is a set of relations that satisfy the predicates. As previously noted in [28], there is therefore a natural fit between Alloy models and axiomatic memory models.

At a high level, our tool works as follows:

1. The two sides of an input transformation B<sup>1</sup> and B<sup>2</sup> are automatically converted into Alloy predicates expressing their syntactic structure. Intuitively, these block predicates are built by following the thread-local semantics from Sect. 3.


The Alloy\* solver is parameterised by the maximum size of the model it will examine. However, our finiteness theorem for <sup>c</sup> (Theorem 4) means there is a bound on the size of cut-down context that needs to be considered to verify any given transformation. If our tool reports that a transformation is correct, it is verified in all syntactic contexts of unbounded size.

Given a query <sup>B</sup><sup>1</sup> <sup>c</sup> <sup>B</sup>2, the required context bound grows in proportion to the number of internal actions on distinct locations in B1. This is because our cutting predicate permits context actions if they interact with internal actions, either directly, or by interleaving between internal actions. In our experiments we run the tool with a model bound of 10, sufficient to give soundness for all the transformations we consider. Note that most of our example transformations do not require such a large bound, and execution times improve if it is reduced.

If a counter-example is discovered, the problematic execution and history can be viewed using the Alloy model visualiser, which has a similar appearance to the execution diagrams in this paper. The output model generated by our tool encodes the history of <sup>B</sup><sup>1</sup> for which no history of <sup>B</sup><sup>2</sup> could be found. As <sup>c</sup> is not fully abstract, this counter-example could, of course, be spurious.

Stellite currently supports transformations on code-blocks with atomic reads, writes, and fences. It does not yet support code-blocks with non-atomic accesses (see Sect. 7), LL-SC, or branching control-flow. We believe supporting the above features would not present fundamental difficulties, since the structure of the Alloy encoding would be similar. Despite the above limitations, our prototype demonstrates that our cut-down denotation can be used for automatic verification of important program transformations.

*Experimental Results.* We have tested our tool on a range of different transformations. A table of experimental results is given in Fig. 9. Many of our examples are derived from [23] – we cover all their examples that fit into our tool's input language. Transformations of the sort that we check have led to real-world bugs in GCC [19] and LLVM [8]. Note that some transformations are invalid because of their effect on local variables, e.g. skip <sup>l</sup> := load(x). The closely related transformation skip load(x) throws away the result of the read, and is consequently valid.

Our tool takes significant time to verify some of the above examples, and two of the transformations cause the tool to time out. This is due to the complexity and non-determinism of the C11 model. In particular, our execution times are comparable to existing C++ model *simulators* such as Cppmem when they run on a few lines of code [3]. However, our tool is a sound transformation verifier, rather than a simulator, and thus solves a more difficult problem: transformations

**Fig. 9.** Results from executing Stellite on a 32 core 2.3 GHz AMD Opteron, with 128 GB RAM, over Linux 3.13.0-88 and Java 1.8.0 91. load/store/fence are abbreviated to ld/st/fc. and x denote whether the transformation satisfies c. ∞ denotes a timeout after 8 h.

are verified for unboundedly large syntactic contexts and executions, rather than for a single execution.

### **7 Transformations with Non-atomics**

We now extend our approach to *non-atomic* (i.e. unsynchronised) accesses. C11 non-atomics are intended to enable sequential compiler optimisations that would otherwise be unsound in a concurrent context. To achieve this, any concurrent read-write or write-write pair of non-atomic actions on the same location is declared a *data race*, which causes the whole program to have undefined behaviour. Therefore, adding non-atomics impacts not just the model, but also our denotation.

### **7.1 Memory Model with Non-atomics**

Non-atomic loads and stores are added to the model by introducing new commands storeNA(x, l) and <sup>l</sup> := loadNA(x) and the corresponding kinds of actions: storeNA, loadNA <sup>∈</sup> Kind. We let NA be the set of all actions of these kinds. We partition global variables so that they are either only accessed by non-atomics, or by atomics. We do not permit non-atomic LL-SC operations. Two new validity axioms ensure that non-atomics read from writes that happen before them, but not from stale writes:

**Fig. 10.** *Top left*: augmented MP, with non-atomic accesses to x, and a new racy load. *Top right*: the same code optimised with *B*<sup>2</sup> *B*1. *Below each:* a valid execution.

– RFHBNA: <sup>∀</sup>w, r <sup>∈</sup> NA. w rf −→ <sup>r</sup> <sup>=</sup><sup>⇒</sup> <sup>w</sup> hb −→ <sup>r</sup> – CoherNA: ¬∃w1, w2, r <sup>∈</sup> NA. w<sup>1</sup> hb rf <sup>w</sup><sup>2</sup> hb <sup>r</sup>

Modification order (mo) does not cover non-atomic accesses, and we change the definition of happens-before (hb), so that non-atomic loads do not add edges to it:

– HBdef: hb = (sb <sup>∪</sup> (rf ∩ {(w, r) <sup>|</sup> w, r /<sup>∈</sup> NA}))<sup>+</sup>

Consider the code on the left in Fig. 10: it is similar to MP from Fig. 1, but we have removed the if-statement, made all accesses to x non-atomic, and we have added an additional load of x at the start of the right-hand thread. The valid execution of this code on the left-hand side demonstrates the additions to the model for non-atomics:


The most significant change to the model is the introduction of a *safety axiom*, data-race freedom (DRF). This forbids non-atomic read-write and writewrite pairs that are unordered in hb:

$$\begin{array}{c} \text{DRF:} \\ \forall u, v \in \mathcal{A}. \begin{pmatrix} \exists x. \, u \neq v \land u = (\mathsf{store}(x, \mathsf{\Box})) \land \\ v \in \{ (\mathsf{load}(x, \mathsf{\Box})), (\mathsf{store}(x, \mathsf{\Box})) \} \end{pmatrix} \implies \left( u \xrightarrow{\mathsf{hb}} v \lor v \xrightarrow{\mathsf{hb}} u \right), \end{array}$$

We write safe(X) if an execution satisfies this axiom. Returning to the left of Fig. 10, we see that there is a violation of DRF – a race on non-atomics – between the first load of x and the store of x on the left-hand thread.

Let -PNA <sup>v</sup> be defined same way as -P is in Sect. 3, def. (3), but with adding the axioms RFHBNA and CoherNA and substituting the changed axiom HBdef. Then the semantics -P of a program with non-atomics is:

$$\begin{array}{rcl} \left\lbrack \left\lbrack P \right\rbrack \right\rbrack & \stackrel{\Delta}{=} & \text{if } \forall X \in \left\lbrack \left\lbrack P \right\rbrack \right\rbrack\_{v}^{\mathsf{NA}}. \mathsf{safe}(X) \text{ then } \left\lbrack \left\lbrack P \right\rbrack \right\rbrack\_{v}^{\mathsf{NA}} \text{ else } \top \end{array}$$

The undefined behaviour subsumes all others, so any program observationally refines a racy program. Hence we modify our notion of observational refinement on whole programs:

$$P\_1 \nleq\_{\mathsf{pr}}^{\mathsf{NA}} P\_2 \iff \mathsf{(safe}(P\_2) \implies (\mathsf{safe}(P\_1) \land P\_1 \nleq\_{\mathsf{pr}} P\_2))$$

This always holds when P<sup>2</sup> is unsafe; otherwise, it requires P<sup>1</sup> to preserve safety and observations to match. We define observational refinement on blocks, -NA bl , by lifting -NA pr as per Sect. 2, def. (2).

#### **7.2 Denotation with Non-atomics**

We now define our denotation for non-atomics, NA <sup>q</sup> , building on the 'quantified' denotation <sup>q</sup> defined in Sect. 4. (We have also defined a finite variant of this denotation using the cutting strategy described in Sect. 5 – we leave this to [10, Sect. C].)

Non-atomic actions do not participate in happens-before (hb) or coherence order (mo). For this reason, we need not change the structure of the history. However, non-atomics introduce undefined behaviour , which is a special kind of observable behaviour. If a block races with its context in some execution, the whole program becomes unsafe, for all executions. Therefore, our denotation must identify how a block may race with its context. In particular, for the denotation to be adequate, for any context <sup>C</sup> and two blocks <sup>B</sup><sup>1</sup> NA <sup>q</sup> <sup>B</sup>2, we must have that if C(B1) is racy, then C(B2) is also racy.

To motivate the precise definition of NA <sup>q</sup> , we consider the following (sound) 'anti-roach-motel' transformation<sup>4</sup>, noting that it might be applied to the righthand thread of the code in the left of Fig. 10:

$$\begin{aligned} B\_2 \colon \mathtt{11} &:= \mathtt{1oad\_{\mathtt{NA}}(\mathtt{x})}; \ 12 &:= \mathtt{1oad(\mathtt{y})}; \ 13 &:= \mathtt{1oad\_{\mathtt{NA}}(\mathtt{x})} \\ \leadsto \quad B\_1 &: \mathtt{11} &:= \mathtt{1oad\_{\mathtt{NA}}(\mathtt{x})}; \ 13 &:= \mathtt{1oad\_{\mathtt{NA}}(\mathtt{x})}; \ 12 &:= \mathtt{1oad(\mathtt{y})} \end{aligned}$$

<sup>4</sup> This example was provided to us by Lahav, Giannarakis and Vafeiadis in personal communication.

In a standard roach-motel transformation [25], operations are moved into a synchronised block. This is sound because it only introduces new happens-before ordering between events, thereby restricting the execution of the program and preserving data-race freedom. In the above transformation, the second NA load of x is moved past the atomic load of y, effectively *out* of the synchronised block, reducing happens-before ordering, and possibly introducing new races. However, this is sound, because any data-race generated by B<sup>1</sup> must have already occurred with the first NA load of x, matching a racy execution of <sup>B</sup>2. Verifying this transformation requires that we reason about races, so NA <sup>q</sup> must account for both racy and non-racy behaviour.

The code on the left of Fig. 10 represents a context, composed with B2, and the execution of Fig. 10 demonstrates that together they are racy. If we were to apply our transformation to the fragment B<sup>2</sup> of the right-hand thread, then we would produce the code on the right in Fig. 10. On the right in Fig. 10, we present a similar execution to the one given on the left. The reordering on the right-hand thread has led to the second load of x taking the value 0 rather than 1, in accordance with RFHBNA. Note that the execution still has a race on the first load of x, albeit with different following events. As this example illustrates, when considering racy executions in the definition of NA <sup>q</sup> , we may need to match executions of the two code-blocks that behave differently after a race. This is the key subtlety in our definition of NA <sup>q</sup> .

In more detail, for two related blocks <sup>B</sup><sup>1</sup> NA <sup>q</sup> <sup>B</sup>2, if <sup>B</sup><sup>2</sup> generates a race in a block-local execution under a given (reduced) context, then we require B<sup>1</sup> and B<sup>2</sup> to have corresponding histories *only up to the point the race occurs*. Once the race has occurred, the following behaviours of B<sup>1</sup> and B<sup>2</sup> may differ. This still ensures adequacy: when the blocks B<sup>1</sup> and B<sup>2</sup> are embedded into a syntactic context C, this ensures that a race can be reproduced in C(B2), and hence, C(B1) -NA pr <sup>C</sup>(B2).

By default, C11 executions represent a program's complete behaviour to termination. To allow us to compare executions up to the point a race occurs, we use *prefixes* of executions. We therefore introduce the *downclosure* X↓, the set of (hb <sup>∪</sup> rf)<sup>+</sup>-prefixes of an execution <sup>X</sup>:

$$X^\downarrow \stackrel{\Delta}{=} \{ X' \mid \exists \mathcal{A}. X' = X \vert\_{\mathcal{A}} \land \forall (u, v) \in (\mathsf{hb}(X) \cup \mathsf{rf}(X))^+. (v \in \mathcal{A} \Rightarrow u \in \mathcal{A}) \}$$

Here <sup>X</sup>|<sup>A</sup> is the projection of the execution <sup>X</sup> to actions in <sup>A</sup>. We lift the downclosure to sets of executions in the standard way.

Now we define our refinement relation <sup>B</sup><sup>1</sup> NA <sup>q</sup> <sup>B</sup><sup>2</sup> as follows:

$$\begin{array}{c} B\_1 \sqsubseteq\_{\mathsf{q}}^{\mathsf{NA}} B\_2 \iff \forall \mathcal{A}, R, S. \forall X\_1 \in [B\_1, \mathcal{A}, R, S]\_{v}^{\mathsf{NA}}. \exists X\_2 \in [B\_2, \mathcal{A}, R, S]\_{v}^{\mathsf{NA}}. \\ (\mathsf{safe}(X\_2) \implies \mathsf{safe}(X\_1) \land \mathsf{hist}(X\_1) \sqsubseteq\_{\mathsf{h}} \mathsf{hist}(X\_2)) \land \\ (\neg \mathsf{safe}(X\_2) \implies \exists X'\_2 \in (X\_2)^\perp. \exists X'\_1 \in (X\_1)^\downarrow. \\ \neg \mathsf{safe}(X'\_2) \land \mathsf{hist}(X'\_1) \subseteq\_{\mathsf{h}} \mathsf{hist}(X'\_2)) \end{array}$$

In this definition, for each execution X<sup>1</sup> of block B1, we witness an execution X<sup>2</sup> of block B<sup>2</sup> that is related. The relationship depends on whether X<sup>2</sup> is safe or unsafe.

**Fig. 11.** History comparison for an NA-based program transformation


Recall the transformation B<sup>2</sup> B<sup>1</sup> given above. To verify it, we must establish that <sup>B</sup><sup>1</sup> NA <sup>q</sup> <sup>B</sup>2. As before, we illustrate the reasoning for a single block-local execution – verifying the transformation would require a proof for all block-local executions.

In Fig. <sup>11</sup> we give an execution <sup>X</sup><sup>1</sup> <sup>∈</sup> -<sup>B</sup>1, <sup>A</sup>, R, S, with a context action set <sup>A</sup> consisting of a non-atomic store of x = 1 and an atomic store of y = 1, and a context relation <sup>R</sup> relating the store of x to the store of y. Note that this choice of context actions matches the left-hand thread in the code listings of Fig. 10, and there are data races between the loads and the store on x.

To prove the refinement for this execution, we exhibit a corresponding unsafe execution <sup>X</sup><sup>2</sup> <sup>∈</sup> -<sup>B</sup>2, <sup>A</sup>, R, Sv. The histories of the *complete* executions <sup>X</sup><sup>1</sup> and <sup>X</sup><sup>2</sup> differ in their return action. In <sup>X</sup><sup>2</sup> the load of <sup>y</sup> takes the value of the context store, so CoherNA forces the second load of x to read from the context store of x. This changes the values of local variables recorded in ret . However, because X<sup>2</sup> is unsafe, we can select a prefix X <sup>2</sup> which includes the race (we denote in grey the parts that we do not include). Similarly, we can select a prefix X <sup>1</sup> of X1. We have that hist(X <sup>1</sup>) = hist(X <sup>2</sup>) (shown in the figure), even though the histories hist(X1) and hist(X2) do not correspond.

**Theorem 5** (Adequacy of NA <sup>q</sup> )**.** <sup>B</sup><sup>1</sup> NA <sup>q</sup> <sup>B</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>B</sup><sup>1</sup> -NA bl <sup>B</sup>2.

**Theorem 6** (Full abstraction of NA <sup>q</sup> )**.** <sup>B</sup><sup>1</sup> -NA bl <sup>B</sup><sup>2</sup> <sup>⇒</sup> <sup>B</sup><sup>1</sup> NA <sup>q</sup> <sup>B</sup>2. We prove Theorem 5 in [10, Sect. B] and Theorem 6 in [10, Sect. F]. Note that the prefixing in our definition of NA <sup>q</sup> is required for full abstraction—but it would be adequate to always require *complete* executions with related histories.

### **8 Full Abstraction**

The key idea of our proofs of full abstraction (Theorems 2 and 6, given in full in [10, Sect. F]) is to construct a special syntactic context that is sensitive to one particular history. Namely, given an execution X produced from a block B with context happens-before R, this context C<sup>X</sup> guarantees: (1) that X is the block portion of an execution of CX(B); and (2) for any block B , if CX(B ) has a different block history from X, then this is visible in different observable behaviour. Therefore for any blocks that are distinguished by different histories, C<sup>X</sup> can produce a program with different observable behaviour, establishing full abstraction.

*Special Context Construction.* The precise definition of the special context construction C<sup>X</sup> is given in [10, Sect. F] – here we sketch its behaviour. C<sup>X</sup> executes the context operations from X in parallel with the block. It wraps these operations in auxiliary wrapper code to enforce context happens-before, R, and to check the history. If wrapper code fails, it writes to an error variable, which thereby alters the observable behaviour.

The context must generate edges in R. This is enforced by wrappers that use watchdog variables to create hb-edges: each edge (u, v) <sup>∈</sup> <sup>R</sup> is replicated by a write and read on variable h(u,v). If the read on h(u,v) does not read the write, then the error variable is written. The shape of a successful read is given on the left in Fig. 12.

**Fig. 12.** The execution shapes generated by the special context for, on the *left*, generation of *R*, and on the *right*, errant history edges.

The context must also prohibit history edges beyond those in the original guarantee G, and again it uses watchdog variables. For each (u, v) *not* in G, the special context writes to watchdog variable g(u,v) before u and a reads g(u,v) after v. If the read of g(u,v) *does* read the value written before u, then there is an errant history edge, and the error location is written. An erroneous execution has the shape given on the right in Fig. 12 (omitting the write to the error location). *Full Abstraction and LL-SC.* Our proof of full abstraction for the language with C11 non-atomics requires the language to also include LL-SC, not just C11's standard CAS: the former operation increases the observational power of the context. However, *without* non-atomics (Sect. 4) CAS would be sufficient to prove full abstraction.

### **9 Related Work**

Our approach builds on our prior work [3], which generalises linearizability [11] to the C11 memory model. This work represented interactions between a library and its clients by sets of histories consisting of a guarantee and a deny; we do the same for code-block and context. However, our previous work assumed *information hiding*, i.e., that the variables used by the library cannot be directly accessed by clients; we lift this assumption here. We also establish both adequacy and full abstraction, propose a finite denotation, and build an automated verification tool.

Our approach is similar in structure to the seminal concurrency semantics of Brookes [6]: i.e. a code block is represented by a denotation capturing possible interactions with an abstracted context. In [6], denotations are sets of traces, consisting of sequences of global program states; context actions are represented by changes in these states. To handle the more complex axiomatic memory model, our denotation consists of sets of context actions and relations on them, with context actions explicitly represented as such. Also, in order to achieve full abstraction, Brookes assumes a powerful atomic await() instruction which blocks until the global state satisfies a predicate. Our result does not require this: all our instructions operate on single locations, and our strongest instruction is LL-SC, which is commonly available on hardware.

Brookes-like approaches have been applied to several relaxed models: operational hardware models [7], TSO [13], and SC-DRF [21]. Also, [7,21] define tools for verifying program transformations. All three approaches are based on traces rather than partial orders, and are therefore not directly portable to C11-style axiomatic memory models. All three also target substantially stronger (i.e. more restrictive) models.

Methods for verifying code transformations, either manually or using proof assistants, have been proposed for several relaxed models: TSO [24,26,27], Java [25] and C/C++ [23]. These methods are non-compositional in the sense that verifying a transformation requires considering the trace set of the entire program—there is no abstraction of the context. We abstract both the sequential and concurrent context and thereby support automated verification. The above methods also model transformations as rewrites on program executions, whereas we treat them directly as modifications of program syntax; the latter corresponds more closely to actual compilers. Finally, these methods all require considerable proof effort; we build an automated verification tool.

Our tool is a sound verification tool – that is, transformations are verified for all context and all executions of unbounded size. Several tools exist for testing (not verifying) program transformations on axiomatic memory models by searching for counter-examples to correctness, e.g., [16] for GCC and [8] for LLVM. Alloy was used by [28] in a testing tool for comparing memory models – this includes comparing language-level constructs with their compiled forms.

### **10 Conclusions**

We have proposed the first fully abstract denotational semantics for an axiomatic relaxed memory model, and using this, we have built the first tool capable of automatically verifying program transformation on such a model. Our theory lays the groundwork for further research into the properties of axiomatic models. In particular, our definition of the denotation as a set of histories and our context reduction should be portable to other axiomatic models based on happens-before, such as those for hardware [1].

**Acknowledgements.** Thanks to Jeremy Jacob, Viktor Vafeiadis, and John Wickerson for comments and suggestions. Dodds was supported by a Royal Society Industrial Fellowship, and undertook this work while faculty at the University of York. Batty is supported by a Lloyds Register Foundation and Royal Academy of Engineering Research Fellowship.

### **References**


Compositional Verification of Compiler Optimisations on Relaxed Memory 1055

28. Wickerson, J., Batty, M., Sorensen, T., Constantinides, G.A.: Automatically comparing memory consistency models. In: Symposium on Principles of Programming Languages (POPL), pp. 190–204 (2017)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Author Index

Abdulla, Parosh Aziz 442 Aguado, Joaquín 86 Aguirre, Alejandro 214 Bagnall, Alexander 561 Barthe, Gilles 117, 214 Batty, Mark 1027 Batz, Kevin 186 Bi, Xuan 3 Bichsel, Benjamin 145 Birkedal, Lars 214, 475 Bizjak, Aleš 214 Brunet, Paul 856 Charguéraud, Arthur 533 Chatterjee, Krishnendu 739 Chen, Tzu-Chun 799 Clebsch, Sylvan 885 Devriese, Dominique 475 Dodds, Mike 1027 Doko, Marko 357 Drossopoulou, Sophia 885 Eilers, Marco 502 Espitau, Thomas 117 Esteves-Verissimo, Paulo 619 Eugster, Patrick 799 Foster, Jeffrey S. 653 Franco, Juliana 885 Gaboardi, Marco 117, 214 García-Pérez, Álvaro 912 Garg, Deepak 214 Gehr, Timon 145 Goharshady, Amir Kafshdar 739 Gommerstadt, Hannah 771 Gotsman, Alexey 912, 1027 Grégoire, Benjamin 117 Guéneau, Armaël 533

Hamin, Jafar 415 Hicks, Michael 653 Hitz, Samuel 502 Hobor, Aquinas 385 Hsu, Justin 117 Hu, Raymond 799 Hupel, Lars 999

Jabs, Julian 60 Jacobs, Bart 415 Jagadeesan, Radha 968 Jia, Limin 771 Jonsson, Bengt 442

Kaminski, Benjamin Lucien 186 Kappé, Tobias 856 Karachalias, Georgios 327 Katoen, Joost-Pieter 186 Kobayashi, Naoki 711

Lahav, Ori 357, 940 Le, Xuan-Bach 385

Mardziel, Piotr 653 Matheja, Christoph 186 Matsuda, Kazutaka 31 Mendler, Michael 86 Merten, Samuel 561 Meshman, Yuri 912 Moore, Brandon 589 Müller, Peter 502, 683

Nipkow, Tobias 999

Oliveira, Bruno C. d. S. 3, 272 Ostermann, Klaus 60

Pédrot, Pierre-Marie 245 Peña, Lucas 589 Pfenning, Frank 771 Pichon-Pharabod, Jean 357 Pottier, François 533 Pouzet, Marc 86 Pretnar, Matija 327

Raad, Azalea 940 Rahli, Vincent 619 Riely, James 968 Roop, Partha 86 Rosu, Grigore 589 Ruef, Andrew 653

Saleh, Amr Hany 327 Schrijvers, Tom 327 Sergey, Ilya 912 Silva, Alexandra 856 Simpson, Alex 300 Skorstengaard, Lau 475 Stewart, Gordon 561 Strub, Pierre-Yves 117 Svendsen, Kasper 357 Tabareau, Nicolas 245

Toninho, Bernardo 827 Trinh, Cong Quy 442 Tsukada, Takeshi 711

Urban, Caterina 683

Vafeiadis, Viktor 357 , 940 Vechev, Martin 145 Velner, Yaron 739 Viering, Malte 799 Vitek, Jan 885 Völp, Marcus 619 von Hanxleden, Reinhard 86 Voorneveld, Niels 300 Vukotic, Ivana 619

Wang, Meng 31 Watanabe, Keiichi 711 Wei, Shiyi 653 Wrigstad, Tobias 885

Xie, Ningning 3 , 272

Yoshida, Nobuko 827

Zanasi, Fabio 856 Ziarek, Lukasz 799